Under CUDA, we have flamegpu::detail::compute_capability to check at runtime if the requested device(s) are compatible with the binary by comparing the runtime device compute capabililty agasint __CUDA_ARCH_LIST__ (which is not perfect but good enough for typical use) to provide more helpful errors if the device is not compatible with the produced binary.
With HIP/ROCm, I can't find a good way to replicate this:
- no equivalent to
__CUDA_ARCH_LIST__, where all compiled architectures are in the binary
__HIP_ARCH_*__ are defined to 0 or 1 in device passes, but we need all of them in the host pass so this is not appropriate.
- HIP architectures have
major, minor and subminor versions, but subminor is not exposed numerically in the device properties at runtime
- The string version of an architecture available at runtime which does include the subminor
--offload-arch / CMAKE_HIP_ARCHITECTURES may include family options such as gfx-0-generic, which is roughly the same as just targetting a major compute capability, which can then run on all minor architectures. However gfx-9-generic does not include all gfx9XX devices, as it does not support gfx942or gfx950.
- additionally
gfx-9-4-generic supports gfx942 and gfx950. There are non-trivial rules about what these map to. This feels very brittle to try and resolve ourselves / have to keep track of from the llvm docs.
Due to this, I'm just not going to bother validating / checking the GPU architecture support at runtime, i.e. not implement flamegpu::detail::compute_capability for AMD in #1379.
Ideally we would do this for better error messages (and for heterogenious multi-gpu systems with partial compilation support), if we can find a reliable way to do this.
We could potentially have a noop kernel which we launch immedately on device initialisation, and catch the specific runtime error and assume that it must be due to an invalid value, but that's pretty grim.
Under CUDA, we have
flamegpu::detail::compute_capabilityto check at runtime if the requested device(s) are compatible with the binary by comparing the runtime device compute capabililty agasint__CUDA_ARCH_LIST__(which is not perfect but good enough for typical use) to provide more helpful errors if the device is not compatible with the produced binary.With HIP/ROCm, I can't find a good way to replicate this:
__CUDA_ARCH_LIST__, where all compiled architectures are in the binary__HIP_ARCH_*__are defined to0or1in device passes, but we need all of them in the host pass so this is not appropriate.major,minorandsubminorversions, butsubminoris not exposed numerically in the device properties at runtime--offload-arch/CMAKE_HIP_ARCHITECTURESmay include family options such asgfx-0-generic, which is roughly the same as just targetting a major compute capability, which can then run on all minor architectures. Howevergfx-9-genericdoes not include allgfx9XXdevices, as it does not supportgfx942orgfx950.gfx-9-4-genericsupportsgfx942andgfx950. There are non-trivial rules about what these map to. This feels very brittle to try and resolve ourselves / have to keep track of from the llvm docs.Due to this, I'm just not going to bother validating / checking the GPU architecture support at runtime, i.e. not implement
flamegpu::detail::compute_capabilityfor AMD in #1379.Ideally we would do this for better error messages (and for heterogenious multi-gpu systems with partial compilation support), if we can find a reliable way to do this.
We could potentially have a noop kernel which we launch immedately on device initialisation, and catch the specific runtime error and assume that it must be due to an invalid value, but that's pretty grim.