Description
Hi! In the section regarding removal of @fence
discussing StoreLoad barriers, a suggestion is made that you can use a non-mutating RMW to effectively create a "release load" and/or an "acquire store":
- Making a store (A/C) Acquire and its matching load (D/B) Release. Semantically, this would mean upgrading them to read-modify-write operations, which could be such ordering. Loads can be replaced with a non-mutating RMW, i.e. fetchAdd(0) or fetchOr(0).
However, this is not correct according to the C/C++ memory model unless both RMW operations are marked as SeqCst, which would defeat the point of using a RMW. In fact this distinction is the entire reason for the difference between SeqCst and AcqRel ordering for RMW operations.
To see why, consider an architecture that may use load-link/store-conditional. I'll use AArch64 here but don't get too hung up on their particular memory model as I'm not familiar enough with it to know if this particular reordering is possible. A "store acquire" RMW like an exchange followed by an "load release" RMW like a fetch-or with 0 will likely be lowered to something like the following:
_loopX:
ldaxr roldval, [X];
stlr rres, rX, [X];
cbnz rres, _loopX;
_loopY:
ldxr rY, [Y];
stlxr rres, rY, [Y];
cbnz rres, _loopY;
In this example, nothing prevents reordering the stlr rres, rX, [X];
to be after the ldxr rY, [Y];
. In a sense the "store-half" of the acquire exchange and the "load-half" of the release fetch-or are Relaxed. So the RMW is atomic with respect to the modification of X
and Y
individually, but can only be considered atomic with respect to each other if all involved operations are part of the SeqCst cross-thread total ordering.
Some citations:
https://groups.google.com/g/lock-free/c/Nescdq-8qVM
https://stackoverflow.com/questions/52606524/what-exact-rules-in-the-c-memory-model-prevent-reordering-before-acquire-opera/52636008#52636008