Add memory orderings and use them for fences #2581

ikbuibui · 2025-11-27T10:23:08Z

Adds memory ordering tags.
Defines a trait to get the default memory orders for fences for each backend.
Adds ability for the user to optionally specify a memory ordering for mem fences.

ikbuibui · 2025-11-27T10:42:31Z

I dont define a consume memory ordering at all, since from what I know it is not well defined (broken) and deprecated in C++26 anyways and everyone just implements it as acquire to be safe and correct.
https://en.cppreference.com/w/cpp/atomic/memory_order.html#Release-Consume_ordering

ikbuibui · 2025-11-27T10:52:18Z

The implementation for HIP comes from here

ikbuibui · 2025-11-27T13:49:33Z

CUDA memory orders are introduced with this PR but these are available only from CUDA 12.8, so #2574 would be nice to test this

ikbuibui · 2025-12-11T14:15:02Z

Inspected PTX and confirmed that this works for CUDA>=12.8.
With alpaka::mem_fence(acc, alpaka::mem_order::acq_rel, alpaka::memory_scope::Block{});. we get a fence.acq_rel.cta; instead of a fence.sc.cta; we have otherwise

fwyzard · 2025-12-20T09:55:30Z

include/alpaka/mem/order/MemoryOrder.hpp

+         * The user requested memory order may be converted to a stronger memory order guarantee if the backend does
+         * not support the requested memory ordering


* The user requested memory order may be converted to a stronger memory order guarantee if the backend does * not support the requested memory ordering

Though in practice this is never the case ?

Mhm, I see that it is done at the implementation level, not in the backend-specific tags ?

Yes, it is done at the implementation level as it depends on details of the backend.

fwyzard · 2025-12-20T10:21:29Z

include/alpaka/mem/order/MemoryOrderHip.hpp

+
+#include <concepts>
+
+#ifdef ALPAKA_ACC_GPU_HIP_ENABLED


~~Looking at the HIP documentation, I think this may require ROCm 6.4.0 (based on clang 19).~~
~~ROCm 6.4.0 mentions __builtin_amdgcn_fence in the docs, while ROCm 6.3.3 does not.~~

However looking at the HIP code, it was available much earlier, looks like it was simply undocumented.

fwyzard · 2025-12-20T10:22:48Z

include/alpaka/mem/order/MemoryOrder.hpp

+        {
+        };
+
+        struct AcqRel : MemoryOrderTag


Just an idea, what if we spelled the full AcquireRelease and SequentialConsistency ?

I'm open to it with no particular preference. I followed the STL naming since it was concise. If you prefer the full names let me know and I'll update them.

include/alpaka/mem/order/MemoryOrderStl.hpp

fwyzard · 2025-12-20T10:29:44Z

I'm wondering if using an enum for the memory spaces wouldn't be simpler than using tags and a concept ?

include/alpaka/mem/fence/MemFenceOmp2Order.hpp

fwyzard · 2025-12-20T10:40:53Z

Now I'm wondering what happens if one tries to use an acquire-release or sequentially-consistent thread-level fence to order with a block-level fence ?

I don't know why one would do it, but in principle they could.

However this could silently fail in some backends where either thread-level or block-level fences are "skipped" ?

fwyzard · 2025-12-20T10:46:53Z

include/alpaka/mem/fence/MemFenceUniformCudaHipBuiltIn.hpp

-        template<>
-        struct MemFence<MemFenceUniformCudaHipBuiltIn, memory_scope::Block>
+        template<alpaka::MemoryOrder TMemOrder>
+        [[maybe_unused]] static constexpr __device__ void cuda_ptx_fence_device([[maybe_unused]] TMemOrder order)


why not use cuda::atomic::atomic_thread_fence(...) ?

That would have been ideal, but I was trying to avoid requiring libcu++

I wouldn't insist on using it, but if it's available, does the job, and does not create licensing or installation problems (all details to be discussed) maybe we should consider making use of it ?

fwyzard · 2025-12-20T10:49:12Z

@ikbuibui overall this looks very good !

Two questions:

what do you think is missing (since it is in draft mode) ?
I didn't check myself; do we have a reasonable set of tests for the fence operations ? if not, is it something you would be interested / willing to implement ?

ikbuibui · 2025-12-22T13:27:39Z

I'm wondering if using an enum for the memory spaces wouldn't be simpler than using tags and a concept ?

Yes, that is also absolutely doable, and maybe simpler. I was comfortable doing it with tags so thats what I used :)

* what do you think is missing (since it is in draft mode) ? 
* I didn't check myself; do we have a reasonable set of tests for the fence operations ?   if not, is it something you would be interested / willing to implement ?

A combined answer to both was that I still want to look into the tests and haven't done so. In any case I'm not sure if there will be any reasonable way to test if the memory orderings are working as intended other than inspecting the generated code. If I don't come up with anything, I'll mark the PR as ready in the first week of next year.

Now I'm wondering what happens if one tries to use an acquire-release or sequentially-consistent thread-level fence to order with a block-level fence ?

However this could silently fail in some backends where either thread-level or block-level fences are "skipped" ?

This is a good question. I'll think about this a bit, but in any case this problem existed before this PR as well.

ikbuibui force-pushed the memory_ordered_fence branch from d10deae to 7366b27 Compare November 27, 2025 10:29

ikbuibui marked this pull request as draft November 27, 2025 10:54

ikbuibui force-pushed the memory_ordered_fence branch 5 times, most recently from a9d0f14 to 42b4208 Compare November 27, 2025 13:45

ikbuibui force-pushed the memory_ordered_fence branch 10 times, most recently from ff7bdd1 to 3610bdc Compare December 2, 2025 09:34

ikbuibui marked this pull request as ready for review December 2, 2025 14:33

ikbuibui force-pushed the memory_ordered_fence branch from 3610bdc to 4557fb1 Compare December 4, 2025 12:15

ikbuibui marked this pull request as draft December 4, 2025 12:20

ikbuibui force-pushed the memory_ordered_fence branch 3 times, most recently from 913aa75 to 10d4ea4 Compare December 4, 2025 14:41

ikbuibui force-pushed the memory_ordered_fence branch 3 times, most recently from eae0eab to 613aca5 Compare December 11, 2025 13:17

ikbuibui force-pushed the memory_ordered_fence branch 5 times, most recently from acfdd08 to bf20f0d Compare December 12, 2025 16:55

Add memory orderings

73fc281

ikbuibui force-pushed the memory_ordered_fence branch 2 times, most recently from fc4bc25 to e731eda Compare December 12, 2025 17:27

fwyzard added this to the 2.2.0 milestone Dec 15, 2025

ikbuibui force-pushed the memory_ordered_fence branch 6 times, most recently from 31c4744 to b32b0f5 Compare December 19, 2025 12:29

use mem ordering for fence and define per backend defaults

f519df8

ikbuibui force-pushed the memory_ordered_fence branch from b32b0f5 to f519df8 Compare December 19, 2025 13:03

ikbuibui mentioned this pull request Dec 19, 2025

Memory ordered atomics #2602

Draft

2 tasks

fwyzard reviewed Dec 20, 2025

View reviewed changes

include/alpaka/mem/order/MemoryOrderStl.hpp Show resolved Hide resolved

fwyzard reviewed Dec 20, 2025

View reviewed changes

include/alpaka/mem/fence/MemFenceOmp2Order.hpp Outdated Show resolved Hide resolved

fwyzard reviewed Dec 20, 2025

View reviewed changes

renamings

8ca1266

ikbuibui force-pushed the memory_ordered_fence branch from 5e56706 to 8ca1266 Compare December 22, 2025 13:41

		* The user requested memory order may be converted to a stronger memory order guarantee if the backend does
		* not support the requested memory ordering


		#include <concepts>

		#ifdef ALPAKA_ACC_GPU_HIP_ENABLED

Add memory orderings and use them for fences #2581

Are you sure you want to change the base?

Add memory orderings and use them for fences #2581

Uh oh!

Conversation

ikbuibui commented Nov 27, 2025

Uh oh!

ikbuibui commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikbuibui commented Nov 27, 2025

Uh oh!

ikbuibui commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikbuibui commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fwyzard Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fwyzard commented Dec 20, 2025

Uh oh!

Uh oh!

fwyzard commented Dec 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fwyzard commented Dec 20, 2025

Uh oh!

ikbuibui commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ikbuibui commented Nov 27, 2025 •

edited

Loading

ikbuibui commented Nov 27, 2025 •

edited

Loading

ikbuibui commented Dec 11, 2025 •

edited

Loading

fwyzard Dec 20, 2025 •

edited

Loading