Conversation
Our minimum CMake updated for CUDA C++20 allows us to use this.
This section of the readme could do with improving now we have a 3D support matrix
Still lots of changes to make, but atleast with hipclang as the host compiler it gets to the first include <cuda_runtime.h> Doesn't handle architectures propperly, though just the func needs implementing. Lots of author warnigns for bits I skipped throgh Need to test / improve what happens when a project() has CUDA but FLAMEGPU_GPU=HIP is selected. Due to order of execution, it is skipping my error condition for that? WIP: More amd cmake
…ught about for CI's benefit
The readme should be improved once all the changes for HIP/ROCm are known, as the existing structure is not ideal with all the new complexity
…-x hip with HIP enabled. This is not a fatal error, in case rocm/hip change their behaviour, though probably could/should be.
… used in debug-only macros (DTHROW)
…-internal-declaration warnings
…rnings As we require CMake >= 3.25 we can use the SYSTEM argument for FetchContent_Declare
…rust::less/greater
…suite via int division/multiplication
…pported CUDA version
This condition will always be true for CUDA builds
FLAMEGPUDeviceException.cu now compiles via hip
Some code is just hidden behind macro guards for now, which needs explicit hip versions adding later
…pam while working on this PR
…me other things to tweak
…do this so it never happened
…tail/gpu/device_name.hpp and add tests
…lity which is cuda-only. Usage is guarded out, and previously macro'd out tests are now enabled (but heterogenous AMD systems may encounter test failures)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds AMD GPU support via HIP / ROCm
Caution
This is still a WIP - do not review, merge or expect CI to be happy
It will be rebased many times, and may become the base branch for other AMD/HIP/ROCm/clang related PRs so they can all be merged into
masterin a single go.Warning
As of 2026-04-09 offline-C++ compilation-only workflows compile but kernels do not correctly execute.
Leaning towards UB for taking the address of
__global__functions and using for occupancy API / launching kernels on HIP.This is explicitly documented as being supported by CUDA, but a similar statement does not appear in the HIP docs.
It works in a toy-problem using the same macro and templated approach, however (hence UB?). I have some ideas on how to narrow this down.