-
Notifications
You must be signed in to change notification settings - Fork 83
Add memory orderings and use them for fences #2581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
d10deae to
7366b27
Compare
|
I dont define a |
|
The implementation for HIP comes from here |
a9d0f14 to
42b4208
Compare
ff7bdd1 to
3610bdc
Compare
3610bdc to
4557fb1
Compare
913aa75 to
10d4ea4
Compare
eae0eab to
613aca5
Compare
|
Inspected PTX and confirmed that this works for CUDA>=12.8. |
acfdd08 to
bf20f0d
Compare
fc4bc25 to
e731eda
Compare
31c4744 to
b32b0f5
Compare
b32b0f5 to
f519df8
Compare
| * The user requested memory order may be converted to a stronger memory order guarantee if the backend does | ||
| * not support the requested memory ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The user requested memory order may be converted to a stronger memory order guarantee if the backend does * not support the requested memory ordering
Though in practice this is never the case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, I see that it is done at the implementation level, not in the backend-specific tags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is done at the implementation level as it depends on details of the backend.
|
|
||
| #include <concepts> | ||
|
|
||
| #ifdef ALPAKA_ACC_GPU_HIP_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However looking at the HIP code, it was available much earlier, looks like it was simply undocumented.
| { | ||
| }; | ||
|
|
||
| struct AcqRel : MemoryOrderTag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, what if we spelled the full AcquireRelease and SequentialConsistency ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to it with no particular preference. I followed the STL naming since it was concise. If you prefer the full names let me know and I'll update them.
|
I'm wondering if using an |
|
Now I'm wondering what happens if one tries to use an acquire-release or sequentially-consistent thread-level fence to order with a block-level fence ? I don't know why one would do it, but in principle they could. However this could silently fail in some backends where either thread-level or block-level fences are "skipped" ? |
| template<> | ||
| struct MemFence<MemFenceUniformCudaHipBuiltIn, memory_scope::Block> | ||
| template<alpaka::MemoryOrder TMemOrder> | ||
| [[maybe_unused]] static constexpr __device__ void cuda_ptx_fence_device([[maybe_unused]] TMemOrder order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use cuda::atomic::atomic_thread_fence(...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have been ideal, but I was trying to avoid requiring libcu++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't insist on using it, but if it's available, does the job, and does not create licensing or installation problems (all details to be discussed) maybe we should consider making use of it ?
|
@ikbuibui overall this looks very good ! Two questions:
|
Yes, that is also absolutely doable, and maybe simpler. I was comfortable doing it with tags so thats what I used :)
A combined answer to both was that I still want to look into the tests and haven't done so. In any case I'm not sure if there will be any reasonable way to test if the memory orderings are working as intended other than inspecting the generated code. If I don't come up with anything, I'll mark the PR as ready in the first week of next year.
This is a good question. I'll think about this a bit, but in any case this problem existed before this PR as well. |
5e56706 to
8ca1266
Compare
Adds memory ordering tags.
Defines a trait to get the default memory orders for fences for each backend.
Adds ability for the user to optionally specify a memory ordering for mem fences.