AMD GPU Support via HIP/ROCm by ptheywood · Pull Request #1379 · FLAMEGPU/FLAMEGPU2

ptheywood · 2026-04-13T13:36:16Z

Adds AMD GPU support via HIP / ROCm

Caution

This is still a WIP - do not review, merge or expect CI to be happy

It will be rebased many times, and may become the base branch for other AMD/HIP/ROCm/clang related PRs so they can all be merged into master in a single go.

Warning

As of 2026-04-09 offline-C++ compilation-only workflows compile but kernels do not correctly execute.
Leaning towards UB for taking the address of __global__ functions and using for occupancy API / launching kernels on HIP.
This is explicitly documented as being supported by CUDA, but a similar statement does not appear in the HIP docs.
It works in a toy-problem using the same macro and templated approach, however (hence UB?). I have some ideas on how to narrow this down.

Todo: Edit in this PR text when no longer just placeholder text
Closes Initial AMD GPU Support (ROCm/HIP) #1367

…global namespace

Our minimum CMake updated for CUDA C++20 allows us to use this.

This section of the readme could do with improving now we have a 3D support matrix

Still lots of changes to make, but atleast with hipclang as the host compiler it gets to the first include <cuda_runtime.h> Doesn't handle architectures propperly, though just the func needs implementing. Lots of author warnigns for bits I skipped throgh Need to test / improve what happens when a project() has CUDA but FLAMEGPU_GPU=HIP is selected. Due to order of execution, it is skipping my error condition for that? WIP: More amd cmake

… native for now?

…ught about for CI's benefit

The readme should be improved once all the changes for HIP/ROCm are known, as the existing structure is not ideal with all the new complexity

…-x hip with HIP enabled. This is not a fatal error, in case rocm/hip change their behaviour, though probably could/should be.

… used in debug-only macros (DTHROW)

…-internal-declaration warnings

…rnings As we require CMake >= 3.25 we can use the SYSTEM argument for FetchContent_Declare

…rust::less/greater

…suite via int division/multiplication

…pported CUDA version

This condition will always be true for CUDA builds

FLAMEGPUDeviceException.cu now compiles via hip

Some code is just hidden behind macro guards for now, which needs explicit hip versions adding later

…pam while working on this PR

…commit

…me other things to tweak

…do this so it never happened

…t commit.

Closes #1377

Closes #1376

Closes #1378

…hecking.cuh

…tail/gpu/device_name.hpp and add tests

…lity which is cuda-only. Usage is guarded out, and previously macro'd out tests are now enabled (but heterogenous AMD systems may encounter test failures)

… correct scope

ptheywood and others added 30 commits April 2, 2026 12:00

C++20: Add and use gpuErrchk replacement using source_location

8675853

Breaking change: Remove GPU error checking macros which polluted the …

b98aae6

…global namespace

CMake: Update upper-limit CMakeLists.txt to 4.3.0

aea6987

CMake: Replace FLAMEGPU_PROJECT_IS_TOP_LEVEL with PROJECT_IS_TOP_LEVEL

504f560

Our minimum CMake updated for CUDA C++20 allows us to use this.

AMD: Initial rocm readme changes.

372a790

This section of the readme could do with improving now we have a 3D support matrix

CUDA/HIP compiler settings and warnign flags

d7c519e

CMake: CMAKE_HIP_ARCHITECTURES thoughts. far from trivial, maybe just…

258522f

… native for now?

CMake: Disable HIP architectures author warning while it is being tho…

bf70ed6

…ught about for CI's benefit

fixup: find_package(hip not HIP

00594c9

Clang: fix -Wunused-const-variable warnings via inline in defines.h

ab95912

Readme: Explicitly mention hip/rocm host compiler issues.

9937735

The readme should be improved once all the changes for HIP/ROCm are known, as the existing structure is not ideal with all the new complexity

CMake: Emit a single warning if the CXX compiler does not understand …

ce0c8dd

…-x hip with HIP enabled. This is not a fatal error, in case rocm/hip change their behaviour, though probably could/should be.

Clang: Suppress -Wunused-but-set-variable warnings for variables only…

1295b0b

… used in debug-only macros (DTHROW)

Clang: make flamegpu::nvtx::push/pop not static to address -Wunneeded…

eb0e37b

…-internal-declaration warnings

Clang: Address -Wpressimizing-move warnings (prevented copy elision)

97904f6

Clang: Address -Winconsistent-missing-override warnings

d182a87

CMake: Mark nlohmann_json as a SYSTEM dependency to suppress clang wa…

01cc9be

…rnings As we require CMake >= 3.25 we can use the SYSTEM argument for FetchContent_Declare

CMake: Mark all external fetched libraries as SYSTEM

dc58768

TO REBASE/TESTWCU120: Use std::less/greater rather than deprecated th…

d7b2002

…rust::less/greater

Clang: Address -Wunused-but-set-variable warnings in the CXX test suite

575e781

Clang: Address -Wimplicit-const-int-float-conversion warning in test …

d0b9afe

…suite via int division/multiplication

Remove CUDA checks which are no longer relevant due to our minimum su…

a93d88e

…pported CUDA version

CMake: Do not check for CUDA >= 11.2 for FLAMEGPU_NVCC_THREADS

5447ab0

This condition will always be true for CUDA builds

CMake: Add FLAMEGPU_USE_CUDA/FLAMEGPU_USE_HIP public target definitions

82353bc

WIP: Guard cuda includes behind FLAMEGPU_USE_CUDA

88e6b13

WIP: initial macro-based abstration port for HIP.

f0b148c

FLAMEGPUDeviceException.cu now compiles via hip

CMake: Hip warning setting fixup

e760a9d

CMake: Hip warning setting fixup

5172761

HIP: flamegpu static library now compiles under HIP

59f80a6

Some code is just hidden behind macro guards for now, which needs explicit hip versions adding later

DO NOT MERGE: Don't trigger DraftRelease on PullRequest to avoid CI s…

c64161d

…pam while working on this PR

ptheywood mentioned this pull request Apr 29, 2026

ROCm/HIP 6.x support #1388

Open

ptheywood added 2 commits April 29, 2026 16:28

fixup: cmake hip version check was clang ver not hip/rocm ver. Early …

afae2cb

…commit

DO NOT MERGE: Don't build beltsoff for AMD while working on CI

c1db3a0

ptheywood force-pushed the amdgpu branch from 4c406a8 to c1db3a0 Compare April 29, 2026 16:18

ptheywood and others added 16 commits April 29, 2026 17:19

WIP

7d147b6

To split: Fix RTC on cuda. <source_location> + a bad ifndef move + so…

55e4800

…me other things to tweak

Fixup changelog: Undo accidental changes to the changelog. Ideally re…

e5b9fea

…do this so it never happened

GPU: Improve use of gpu abstraction macro/type headers

dfb8a8d

lint fix: void** casting, though this is not new?

e5c09c6

lintfix brace fix in untouched code?

7a30312

lintfix in detail/gpus/macros.hpp whitespace

00e8a5b

lintfix comment space in CUDAErroCHecking __CUDACC_RTC__ endif, recen…

1011607

…t commit.

HIP: Implement roctx within the util::nvtx namespace

da31345

Closes #1377

Fixup: Correctly mark RTC GLM tests as skipped

4f5c7c6

Fixup initail hip port GLM cmake to be blocked rather than a warning

b37e08a

HIP: Enable GLM support by updating GLM to v1.0.3

69ce550

Closes #1376

Remove unused typdef CUDARTCFuncMapPair

8528d87

Closes #1378

simulation/detail/CUDAErrorChecking.cuh -> detail/gpu/gpu_api_error_c…

b3466f7

…hecking.cuh

HIP: util::wddm implementation and test for hip (always not wddm)

1352b33

Tests: Mark RTC tests as skipped on hip with a different message

8b9461a

ptheywood mentioned this pull request May 1, 2026

AMD: device architecture checking at runtime #1389

Open

ptheywood added 2 commits May 1, 2026 14:59

Split/move getDeviceName[s] from detail/compute_capabililty.cuh to de…

69b6399

…tail/gpu/device_name.hpp and add tests

Refactor: detail/compute_capability -> detail/gpu/cuda/compute_capabi…

fb1ece9

…lity which is cuda-only. Usage is guarded out, and previously macro'd out tests are now enabled (but heterogenous AMD systems may encounter test failures)

ptheywood force-pushed the amdgpu branch from ccc488f to fb1ece9 Compare May 1, 2026 16:22

ptheywood and others added 5 commits May 12, 2026 12:34

CUDA fixup: ensbemble cehckComputeCapabiltiy missing ;

31ab310

fixup: Cmake enable_lanuages version_less quoted versions

36b1e4a

Fixup: Cmake delete no longer reuqired variable in enable_languages

06fb990

Fixup: define variables needed in the enable_languages macro with the…

52f7445

… correct scope

Fixup: use of moved compute_capability namespace in JitifyCache.cu

bd95f0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU Support via HIP/ROCm#1379

AMD GPU Support via HIP/ROCm#1379
ptheywood wants to merge 96 commits into
masterfrom
amdgpu

ptheywood commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ptheywood commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant