Chare Arrays: Design Exercise

Data Balancing:

Assume you have a 1D chare array A. Each chare (say A[i]) in it holds a vector of numbers. The size of this vector is different on different chares (say sizei on A[i]). Your task is to equalize the load on all processors by exchanging the numbers. It is not necessary to do minimal data movement, but it is desirable. The balance at the end needs to be almost exact. If there are a total of N numbers, and v chares, there should be between floor (N/v): ceil(N/v) items on each chare. Note that the only way to send information to another chare is by sending an (entry) method invocation to it.

There are many distinct algorithms possible. Sketch the alternatives without coding them, and write cost estimates for them. Keep in mind that the simplest (i.e. approximate) cost model in Charm++: entry methods invocation's cost \alpha + n . {\beta}, where α is a fixed cost, and β is a per-byte cost. For the sake of intuition, you may assume α is about a thousand times larger than β, say a microsecond vs a nanosecond. Reductions and broadcasts of size N data on P processors cost \alpha log (P) + N \beta. Keep in mind that many (but not all) of the algorithms for this problem have two phases: first phase to identify who should send how many numbers to whom, and second to actually do the data exchange. Make sure to write your time estimates for both phases. Compare two of the interesting algorithms in terms of cost, performance tradeoffs if any (e.g. is each algorithm better in different scenarios), scalability and coding complexity. By scalability, here, we mean how well the algorithm behaves with a large number of chares and/or a large number of physical processors.