Properly handle cuda arch for unsupported function by yhmtsai · Pull Request #1853 · ginkgo-project/ginkgo

yhmtsai · 2025-05-28T12:03:10Z

For example, bfloat16 are natively supported after CC 80, we need to throw an exception or avoid failed comfiguration.
It is mostly handled by cmake option when user only compiles for one cuda arch.
Cuda allows to compile library for different arch.
However, CUDA_ARCH is only available in device code not host code, so using macro on host side does not have effect actually.
There is one host macro but will give the entire list.
We can only rely on the runtime dispatch on CC in this case and throw an exception when they are not available.
To achieve that, we need to provide a working version of atomic add just for compilation.

Side note: there are still some issue that compiling bfloat16 kernel in templated lambda after 12.2 on the architecture not natively supporting bfloat16 leads unknown device kernel in runtime. but if duplicate the kernel with full specialization will work. This requires further investigation.

pratikvn

lgtm

Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

yhmtsai self-assigned this May 28, 2025

ginkgo-bot added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels May 28, 2025

yhmtsai requested a review from a team May 28, 2025 16:17

pratikvn requested changes Jul 22, 2025

View reviewed changes

Comment thread common/cuda_hip/matrix/coo_kernels.cpp Outdated

yhmtsai force-pushed the properly_handle_cuda_arch branch from 66748e1 to 432fe5a Compare July 30, 2025 12:45

yhmtsai requested a review from pratikvn July 30, 2025 12:46

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Aug 7, 2025

pratikvn approved these changes Aug 11, 2025

View reviewed changes

yhmtsai force-pushed the properly_handle_cuda_arch branch from 432fe5a to 8f6a18c Compare August 12, 2025 07:47

yhmtsai and others added 3 commits August 14, 2025 13:58

add unsupported atomic add

8bf34b1

proper dispatch the kernel with atomic_add but can only in runtime

75ed5b5

add get_compute_capability in cudaExecutor

cb2d8b6

Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

yhmtsai force-pushed the properly_handle_cuda_arch branch from 8f6a18c to dd5fc45 Compare August 14, 2025 11:58

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 14, 2025

fix typo

0e83b80

yhmtsai force-pushed the properly_handle_cuda_arch branch from dd5fc45 to 0e83b80 Compare August 14, 2025 13:57

yhmtsai merged commit e31cf5e into develop Aug 14, 2025
15 of 16 checks passed

yhmtsai deleted the properly_handle_cuda_arch branch August 14, 2025 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Properly handle cuda arch for unsupported function#1853

Properly handle cuda arch for unsupported function#1853
yhmtsai merged 4 commits into
developfrom
properly_handle_cuda_arch

yhmtsai commented May 28, 2025

Uh oh!

Uh oh!

pratikvn left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yhmtsai commented May 28, 2025

Uh oh!

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants