Skip to content

Global Atomic Operations on Multi-word Data #15

Open
@LouisJenkinsCS

Description

@LouisJenkinsCS

@e-kayrakli
@mppf

This is of importance to both the project and to Chapel as a whole (or so I believe).
I believe that this early prototype of the Global Descriptor Table (GDT), formerly
referred to as GlobalAtomicObject, demonstrates the potential power of global atomics.
There are a few key things here which were misconceptions that I've had and been told
since the beginning...

Communication Is NOT Bad

Communication, even if it is per-operation, is not bad, but it's not good either.
The true bottleneck is contention. I've tested time and time again, and each time
an algorithm containing some contention-causing operation which may cause an unbounded number
of failed operation resulting in an unbounded number of communications, has failed horribly
in terms of performance. For example, operations like this is bad...

while true {
  var _x = x.read();
  var _y = y.read();
  if _x < _y && _x.compareExchangeStrong(_x, _x + 1) {
    break;
  }
}

Will cause an unbounded number of communications. Reading x and y is relatively
expensive, and since the CAS operation not only causes remote contention, but the fact
that both x and y need to be read again, causes performance to drop.

Now, algorithms which are guaranteed to succeed are guaranteed to scale very well;
that is, only wait-free algorithms can see scalability. Code that uses
fetchAdd and exchange work wonderfully, which gives me hope that a global data
structure with very loose guarantees are possible and useful.

Global Descriptor Table

Next, is the new and novel idea of the GDT, the Global Descriptor Table. A 64-bit word
is used over a 128-bit wide pointer allowing us to take advantage of the benefits of
network atomics. In essence, we encode the locale number in the upper 32 bits and
the actual index in the lower 32 bits. Currently, an array is used, but in reality (perhaps
with runtime support if not already available) its possible that a descriptor can be directly
used like a normal pointer. Currently there is a large amount of overhead in needing
to keep a bitmap of usable memory and it cannot be resized without needing to synchronize
all accesses to it (as Chapel domains and arrays cannot be resized in a lock-free way).

Currently, it has been tested with simple exchange operations on class instances
remotely across all locales, versus needing to acquire a sync variable to do the same.
As stated above, a compareExchangeStrong kills performance, but that has nothing to do
with the GDT but with network atomic contention, so its kept simple. It works and it
scales. The below graph shows time to complete the same number of operations
(100,000 in this case). It shows the overall runtime of the average of 4 trials
(discarding the first warm-up), and that while sync will increase in an a near
exponential growth, GDT remains linear.

image

Now, to get a better look at it, here are the same results in Operations per Second.

image

Implications

I believe this is some very valuable information, which is why I include Michael as well.
Not only is the implementation very immature (there are so many optimizations that
can be made, that there's no saying how much more this can scale), it also surpasses
the only way to perform atomics on global class instances. As well, this opens some
very unconventional means of concurrency, the first being the DSMLock (WIP) that is
based on the DSM-Synch (Distributed Shared Memory Combined Synchronization)
in the publication that had CC-Synch (Cache-Coherent Combined Synchronization). As I can confirm
that this scales, I can possibly even make DSMLock scale in such a way that global
lock-based algorithms can scale (not just wait-free). Extremely exciting!

Edit:

If this is extended on for the runtime, I can imagine having the entire global address space chunked up into a 4GB (2^32) zones, with 2^8 zones on 2^24 locales. With 256 of 4GB zones, that's 1TB address space per locale, with 16M+ locales.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions