Over 900 commits (bugfixes + improvements + cleanups) have been applied across the entire system. Major changes are described below:
- Charm++ Features
- Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute.
- Calls to parameter-marshalled entry methods with large array arguments can ask for asynchronous zero-copy send behavior with a 'nocopy' tag in the parameter's declaration.
- The runtime system now integrates an OpenMP runtime library so that code using OpenMP parallelism will dispatch work to idle worker threads within the Charm++ process.
- Applications can ask the runtime system to perform automatic high-level end-of-run performance analysis by linking with the '-tracemode perfReport' option.
- Added a new dynamic remapping/load-balancing strategy, GreedyRefineLB, that offers high result quality and well bounded execution time.
- Improved and expanded topology-aware spanning tree generation strategies, including support for runs on a torus with holes, such as Blue Waters and other Cray XE/XK systems.
- Charm++ programs can now define their own main() function, rather than using a generated implementation from a mainmodule/mainchare combination. This extends the existing Charm++/MPI interoperation feature.
- Improvements to Sections:
- Array sections API has been simplified, with array sections being automatically delegated to CkMulticastMgr (the most efficient implementation in Charm++). Changes are reflected in Chapter 14 of the manual.
- Group sections can now be delegated to CkMulticastMgr (improved performance compared to default implementation). Note that they have to be manually delegated. Documentation is in Chapter 14 of Charm++ manual.
- Group section reductions are now supported for delegated sections via CkMulticastMgr.
- Improved performance of section creation in CkMulticastMgr.
- CkMulticastMgr uses the improved spanning tree strategies. See above.
- GPU manager now creates one instance per OS process and scales the pre-allocated memory pool size according to the GPU memory size and number of GPU manager instances on a physical node.
- Several GPU Manager API changes including:
- Replaced references to global variables in the GPU manager API with calls to functions.
- The user is no longer required to specify a bufferID in dataInfo struct.
- Replaced calls to kernelSelect with direct invocation of functions passed via the work request object (allows CUDA to be built with all programs).
- Added support for malleable jobs that can dynamically shrink and expand the set of compute nodes hosting Charm++ processes.
- Greatly expanded and improved reduction operations:
- Added built-in reductions for all logical and bitwise operations on integer and boolean input.
- Reductions over groups and chare arrays that apply commutative, associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now processed in a streaming fashion. This reduces the memory footprint of reductions. User-defined reductions can opt into this mode as well.
- Added a new 'Tuple' reducer that allows combining multiple reductions of different input data and operations from a common set of source objects to a single target callback.
- Added a new 'Summary Statistics' reducer that provides count, mean, and standard deviation using a numerically-stable streaming algorithm.
- Added a '++quiet' option to suppress charmrun and charm++ non-error messages at startup.
- Calls to chare array element entry methods with the [inline] tag now avoid copying their arguments when the called method takes its parameters by const&, offering a substantial reduction in overhead in those cases.
- Synchronous entry methods that block until completion (marked with the [sync] attribute) can now return any type that defines a PUP method, rather than only message types.
- More efficient implementations of message matching infrastructure, multiple completion routines, and all varieties of reductions and gathers.
- Support for user-defined non-commutative reductions, MPI_BOTTOM, cancelling receive requests, MPI_THREAD_FUNNELED, PSCW synchronization for RMA, and more.
- Fixes to AMPI's extensions for load balancing and to Isomalloc on SMP builds.
- More robust derived datatype support, optimizations for truly contiguous types.
- ROMIO is now built on AMPI and linked in by ampicc by default.
- A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi
- Improved support for performance analysis and visualization with Projections.
- The runtime system code now requires compiler support for C++11 R-value references and move constructors. This is not expected to be incompatible with any currently supported compilers.
- The next feature release (anticipated to be 6.9.0 or 7.0) will require full C++11 support from the compiler and standard library.
- Added support for IBM POWER8 systems with the PAMI communication API, such as development/test platforms for the upcoming Sierra and Summit supercomputers at LLNL and ORNL. Contributed by Sameer Kumar of IBM.
- Mac OS (darwin) builds now default to the modern libc++ standard library instead of the older libstdc++.
- Blue Gene/Q build targets have been added for the 'bgclang' compiler.
- Charm++ can now be built on Cray's CCE 8.5.4+.
- Charm++ will now build without custom configuration on Arch Linux
- Charmrun can automatically detect rank and node count from Slurm/srun environment variables.
The Parallel Programming Laboratory is hosting the 15th Annual Workshop on Charm++ and its Applications April 17-19, 2017. The workshop will be streamed live. You can find the list of talks and webcast here.
There are multiple Charm++ related talks and events at SuperComputing 2016 this week, including a Birds of a Feather session titled Adaptive and Asynchronous Parallel Programming with Charm++ and AMPI. You can find a list of PPL and Charm++ talks here.
The Parallel Programming Laboratory is pleased to announce the stable release of Charm++ version 6.7.1. In depth release notes as well as full version control change logs can be found here. The source code for this release can be downloaded here
The Parallel Programming Laboratory hosted the 14th Annual Workshop on Charm++ and its Applications. You can find the list of talks and their slides here.
The Parallel Programming Laboratory is pleased to announce the stable release of Charm++ version 6.7.0. In depth release notes as well as full version control change logs can be found here. The source code for this release can be downloaded here
The Parallel Programming Laboratory is pleased to announce the stable release of Charm++ version 6.6.0. In depth release notes as well as full version control change logs can be found here. The source code for this release can be downloaded here
The Parallel Programming Laboratory is pleased to announce the stable release of Charm++ version 6.5.1. This release offers several bug fixes, especially on the Cray Gemini and IBM Blue Gene Q architectures, and a new port to the Cray Cascade (XC-30) systems. The list of bugs fixed by this release can be found in Redmine. Release notes and full version control change logs can be found in our Git repository. The source code for this release can be downloaded here. This stable release will be precompiled and offered for use on various major supercomputer installations, including systems at Argonne National Lab, NERSC, NCSA, NICS, Oak Ridge National Lab, SDSC, and TACC.
NAMD, an application developed using Charm++, was recently used in an all-atom molecular dynamics simulation to determine the chemical structure of the HIV Capsid, as reported in a Nature research article. The simulation involving about 64 million atoms, was carried out on the Blue Waters system at the University of Illinois, and benefited from many features and performance optimizations implemented in Charm++. Results from the simulation were also featured on the cover of the magazine. More information about the research and results can be found in the university news release featuring this study of the HIV capsid.
NAMD is a biomolecular dynamics simulation software developed using Charm++. It is the result of an long standing, ongoing, collaboration between two research groups at the Illinois: the Parallel Programming Lab led by Prof. Kale and the Theoretical and Computational Biophysics Group led by Prof. Klaus Schulten.
Charm++ issue tracker is now publicly accessible.
The Parallel Programming Laboratory is pleased to announce the stable release of Charm++ version 6.5.0. This release offers substantially increased performance on the Cray Gemini and IBM Blue Gene Q architectures, revamped developer and user documentation, and numerous performance and usability improvements across the runtime. In depth release notes as well as full version control change logs can be found here. The source code for this release can be downloaded here This stable release will be precompiled and offered for use on various major supercomputer installations, including systems at Argonne National Lab, NERSC, NCSA, NICS, Oak Ridge National Lab, SDSC, and TACC.
Profs. Kale was named one of the winners of the Sidney Fernbach Award, to be presented at Supercomputing 2012.
On Tuesday, May 1, 2012, the UIUC chapter of SIAM will host a Charm++ tutorial given by PPL'r Phil Miller. The tutorial will start at 4:00 p.m. in 4403 Siebel Center. Registration is required. For more information, please visit siam.cs.illinois.edu or register here. The tutorial will present Charm++, which is a portable parallel programming system designed with programmer productivity as a major goal. Attendees will become familiar with the asynchronous, object-based programming model of Charm++ and the capabilities its adaptive runtime system offers. Developed by the Parallel Programming Laboratory over the last 20 years, Charm++ is a portable, mature environment that provides the foundation for several highly scalable and widely used applications in science and engineering, including NAMD, ChaNGa, and OpenAtom. Charm++ runs the same application code on multicore desktops with shared memory, clusters of all sizes, and IBM and Cray supercomputers (such as the upcoming NSF-sponsored Blue Waters), and efficiently supports GPU accelerators where available. The following week, the Parallel Programming Lab will host its 10th Annual Workshop on Charm++ and its Applications May 7-9 at the Siebel Center, bringing together the Charm++ community and showcasing leading-edge developments in parallel computing. The target audience for this tutorial is programmers and researchers with any sort of parallel programming experience and basic knowledge of C or C++.
The Parallel Programming Laboratory is pleased to announce the release of a first beta for Charm++ version 6.4.0. A list of advances in this release can be found in gitweb. Please test your applications for bugs and performance regressions, and post your results on the mailing list. A tarball of the source can be found here, and compiled binaries for our autobuild platforms can be found here.