Introduction to the Charm++ Runtime System
Figure 1: User's View of a Charm++ Application |
Figure 2: System's View of a Charm++ Application |
What is the Charm++ Runtime System?
When a programmer writes a Charm++ application, they write it in terms of chare objects and how the chare objects communicate with one another through method invocation (or message passing). Details such as the number of processors, types of processors, type of interconnect, and so on are not considered. The programmer simply writes the application as a collection of interacting objects. This view of a Charm++ application is referred to as the user's view of a Charm++ application (see Figure 1).
Alternatively, when the application is actually compiled there is a specific target platform including type of processor, type of interconnect, and so on. Additionally, when the application is executed, there is a specific set of physical resources that are made available to the application such as the number of physical processors, and so on. The purpose of the Charm++ Runtime System is to manage as many of the details of the physical resources on behalf of the application, and thus, on behalf of the programmer. This view of the Charm++ application is referred to as the system's view of a Charm++ application (see Figure 2). Management decisions that the Charm++ Runtime System can make on behalf of the application include (but are not limited to):
- Mapping Chare Objects to Physical Processors: Through various methods, the Charm++ Runtime System can assign the chare objects to physical processors.
- Load-Balancing Chare Objects: The Charm++ Runtime System can dynamically migrate chare objects between physical processors as the application executes allowing the application to utilize the physical processors more efficiently.
- Routing of Messages: As chare objects are assigned to physical processors and migrated between physical processors, the Charm++ Runtime System keeps track of where the chare objects live. Messages being sent to a chare object are dynamically routed to the physical processor containing the chare object. The code sending the message is not aware in any way (even if the target chare object migrates to a different physical processor).
- Checkpointing: Because the Charm++ Runtime System, through the use of PUP Routines, can migrate a chare object's state between physical processors, checkpointing is fairly trivial. The Charm++ Runtime System can simply migrate all of the chare objects' states to disk.
- Fault-Tolerance: If a physical processor is experiencing problems or has already crashed, the Charm++ Runtime System can dynamically recreate the chare objects on the failed physical processor on the remaining physical processors.
- Dynamic Re-Allocation of Physical Resources: A cluster being used to execute a Charm++ application may suddenly receives more jobs (or have several jobs finish). The Charm++ Runtime System can dynamically migrate chare objects from (or to) physical processors allowing the application to dynamically shrink to use fewer physical processors (or expand to use more physical processors) based on the cluster's overall load.
Each processing element has its own Charm++ Runtime System running on it. They various instances of the Charm++ Runtime Systems are responsible for their local processing element. They may also communicate with one another for collective operations (such as checkpointing, fault-recovery, load-balancing, and so on).
In actuality, Charm++ Runtime System is a collective term used to describe a collection of components (all of which provide certain functionality). The components of the Charm++ Runtime System are discussed below (see Figure 2 above and Figure 3 below).
The Machine Layer
At the lowest layer of the Charm++ Runtime System is the machine layer. The machine layer contains system specific code that the higher levels of the Charm++ Runtime System use to perform basic tasks such as sending messages, and so on. When the Charm++ Runtime System is built, the user needs to specify which machine layer is to be used (more information on building the Charm++ Runtime System can be found here).
Converse
The layer above the machine layer is called Converse. Converse abstracts the capabilities of the machine layer from the higher layers of the Charm++ Runtime System. It also provides any general functionality not supported by a particular machine layer. For example, if one machine layer has direct support for broadcasts then the machine layer's version of broadcast will be used. However, if the machine layer does not directly support broadcasts, then the general implementation provided by Converse will be used.
In addition to abstracting the machine layers, Converse also contains the Charm++ scheduler and message queue (as shown in Figure 2).
Message Queue
The message queue acts as the collection point for messages arriving to a particular processing element. The message queue is shared for all chares that are located on the processing element. For basic messages, the message queue can be thought of as a simple queue (incoming messages are place at the end of the queue; eventually messages work either way to the head of the queue and are processed one-by-one).
There are other types of messages that are supported by Charm++: expedited messages, immediate messages, and messages with user defined priorities. When theses types of messages are being used by a Charm++ program, the message queue does not act as a simple queue. For more information on these types of messages, please see the Charm++ documentation.
Scheduler
The scheduler is the component in Converse that takes care of selecting a message from the message queue and executing it. The message entry in the message queue represents a triplet (message data, chare object to operate on, and entry method to execute). Once the scheduler has chosen a message to process, it begins execution of the associated entry method. This entry method continues to execute until either it completes or releases control back to the Charm++ Runtime System (i.e. a threaded entry method that calls CkSuspend()). Because of this, the user application code is guaranteed that only a single entry method is executing at any given moment on a particular processing element. One advantage of this approach is that the user's application code to avoid having to use locks to protect variable accessed by multiple entry methods.
Figure 3: A Simplified View of the Charm++ Universe |
How the Charm++ Runtime System Fits In
To the right is a simplified figure of various pieces of software related to Charm++ in some way. At the bottom of the figure is the Charm++ Runtime System and the various software components that it is comprised of. Not mentioned above are the various modules that provide support for activities such as load balancing, fault tolerance, communication optimization, projections, and so on.
Support for Other Languages / Models
In addition to being able to write Charm++ applications using the Charm++ Runtime System, programs that are written using other supported programming models can also take advantage of the features provided by the Charm++ Runtime System.
- Adaptive MPI: MPI programs can take advantage of the virtualization, load balancing, fault tolerance, etc. capabilities of the Charm++ Runtime System.
- Charisma: Charisma is a high-level language that allows a programmer to clearly specify the control flow of a Charm++ application.
- Structured Dagger (SDag): Like Charisma, Structured Dagger allows a programmer to specify the control flow in a Charm++ program at a high-level. However, SDag code is specified directly in a chare object's interface file and has a scope local to the object (i.e. it can be used to specify the control flow for a particular chare class).
- Multiphase Shared Arrays (MSA): MSAs are data arrays that can be placed into one of many access modes (read-only, write-once, accumulate). The idea is to reduce pressure on the memory system with programmer provided knowledge of the data access patterns.
- Threaded Charm (TCharm): ... fill me in ...
Frameworks
- ParFUM: ParFUM is an adaptive unstructured mesh framework.
- POSE: POSE is a PDES (Parallel Discrete Event Simulation) framework.
Tools
- Projections: Performance analysis tool.
- Faucets: Job submission tool.
- CharmDebug: Debugging tool for Charm++ programs.