Skip to content

Add support for AMD GPU#798

Open
pxl-th wants to merge 3 commits intonerfstudio-project:mainfrom
pxl-th:pxl-th/amd
Open

Add support for AMD GPU#798
pxl-th wants to merge 3 commits intonerfstudio-project:mainfrom
pxl-th:pxl-th/amd

Conversation

@pxl-th
Copy link

@pxl-th pxl-th commented Sep 13, 2025

This PR introduces AMD GPU support for gsplat.

Tested on:

Tested and confirmed working:

  • 3DGS (simple & mcmc; packed & un-packed)
  • 2DGS (packed & un-packed)
  • 3DGUT

Relies on:

Closes #771.
Closes #434.

image
  • Move glm library out of the cuda/ directory to avoid hipifying it, which causes confusion during compilation.
  • Since AMD GPU does not support cg::labeled_partition, simply avoid warp reductions on it and just do global memory writes directly. To reduce code duplication, introduce FOR_HIP variable that we use to determine whether to use labeled partition or just a placeholder to avoid compilation errors, e.g.:
#if FOR_HIP
auto warp_group_g = warp; // Not used, just here to not error in the if-statements.
#else
auto warp_group_g = cg::labeled_partition(warp, gid);
#endif

and in places where we'd do only one global atomic add (which should eliminate branching altogether):

if (FOR_HIP || warp_group_g.thread_rank() == 0) {
...
}
  • Replace std::array.at with [] indexing to avoid bounds-checking which causes errors:
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/array:220:9: error: reference to __host__ function '__throw_out_of_range_fmt' in __host__ __device__ function
  220 |           std::__throw_out_of_range_fmt(__N("array::at: __n (which is %zu) "
      |                ^
  • Since ROCm does not support cg::reduce, create equivalent warpSum reduction methods using shfl_down intrinsic & use it where we do reduction on the whole warp (i.e. tiled_partitiion in our case).
  • Use respective NVCC flags depending on the GPU backend. E.g. -munsafe-fp-atomics is required to replace CAS-loop with fast hardware floating-point atomics that significantly improves the performance on AMD GPU.

@pxl-th pxl-th marked this pull request as ready for review September 13, 2025 21:07
@charyang-ai
Copy link

how is the performance comparied to Nvidia RTX card, like 4090?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plans on making GSplat compatible with AMD GPU Looking for an alternative with AMD GPU

2 participants