Capabilities
Automatic Overlap
Because parallelism in Charm++ is expressed via interacting parallel objects instead of processors, the runtime system can seamlessly provide overlap of communication and computation as an application runs.
Automatic Load Balancing
Charm++ ships with an entire suite of load balancers, which can be selected at runtime. All the application must do is provide a hint on when it is a good time to synchronize for load balancing.
Automatic Checkpointing and Fault Tolerance
Charm++ can easily checkpoint an application's data to disk or to the memory of a buddy node. If a fault occurs and the job persists, Charm++ will detect a hard node failure and automatically continue execution from the previous in-memory checkpoint. The programmer simply specifies the data to checkpoint using a clean interface that is used for load balancing to serialize the data.
Power/Energy Optimization
Charm++ runtime can reduce total energy consumption i.e. both machine and cooling energy consumptions, by combining control over the processor operating frequency/voltage with object migration. It also enables restraining core temperatures to save cooling energy.
Portable Code
Charm++ comes pre-packaged with many machine layers that are tuned to the latest supercomputer architectures, ranging from Blue Gene/Q to Cray XK6.
Independent Modules, Interleaved Execution
Because Charm++ programs are written in terms of a set of modules that define parallel objects, multiple modules can execute concurrently. When one module has little work to do or is idle, another module can fill the gap. Because work can be prioritized in Charm++, the user can specify which objects have priority and they will be treated accordingly by the Charm++ scheduler
Interoperable with MPI, OpenMP and CUDA
Charm++ supports time and space sharing with MPI, allowing MPI to execute with Charm++ code either in phases or partitioned by processors. Charm++ comes with its own OpenMP runtime which can be used with Clang, GCC, and ICC compilers and which allows co-scheduling OpenMP tasks and Charm++ entry methods. Charm++ also provides support for asynchronously executing CUDA kernels on the GPU and for orchestrating data movement between hosts and devices.
Ecosystem of Tools
Charm++ is not just a programming language or runtime system; it also comes with a full suite of tools, ranging from a parallel debugger to performance visualization. You can even inject python code on the fly as your application runs using the CCS tool.