Skip to content

Properly handle cuda arch for unsupported function#1853

Merged
yhmtsai merged 4 commits into
developfrom
properly_handle_cuda_arch
Aug 14, 2025
Merged

Properly handle cuda arch for unsupported function#1853
yhmtsai merged 4 commits into
developfrom
properly_handle_cuda_arch

Conversation

@yhmtsai

@yhmtsai yhmtsai commented May 28, 2025

Copy link
Copy Markdown
Member

For example, bfloat16 are natively supported after CC 80, we need to throw an exception or avoid failed comfiguration.
It is mostly handled by cmake option when user only compiles for one cuda arch.
Cuda allows to compile library for different arch.
However, CUDA_ARCH is only available in device code not host code, so using macro on host side does not have effect actually.
There is one host macro but will give the entire list.
We can only rely on the runtime dispatch on CC in this case and throw an exception when they are not available.
To achieve that, we need to provide a working version of atomic add just for compilation.

Side note: there are still some issue that compiling bfloat16 kernel in templated lambda after 12.2 on the architecture not natively supporting bfloat16 leads unknown device kernel in runtime. but if duplicate the kernel with full specialization will work. This requires further investigation.

@yhmtsai yhmtsai self-assigned this May 28, 2025
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels May 28, 2025
@yhmtsai yhmtsai requested a review from a team May 28, 2025 16:17
Comment thread common/cuda_hip/matrix/coo_kernels.cpp Outdated
@yhmtsai yhmtsai force-pushed the properly_handle_cuda_arch branch from 66748e1 to 432fe5a Compare July 30, 2025 12:45
@yhmtsai yhmtsai requested a review from pratikvn July 30, 2025 12:46
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Aug 7, 2025

@pratikvn pratikvn left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yhmtsai yhmtsai force-pushed the properly_handle_cuda_arch branch from 432fe5a to 8f6a18c Compare August 12, 2025 07:47
@yhmtsai yhmtsai force-pushed the properly_handle_cuda_arch branch from 8f6a18c to dd5fc45 Compare August 14, 2025 11:58
@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 14, 2025
@yhmtsai yhmtsai force-pushed the properly_handle_cuda_arch branch from dd5fc45 to 0e83b80 Compare August 14, 2025 13:57
@yhmtsai yhmtsai merged commit e31cf5e into develop Aug 14, 2025
15 of 16 checks passed
@yhmtsai yhmtsai deleted the properly_handle_cuda_arch branch August 14, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. type:matrix-format This is related to the Matrix formats type:solver This is related to the solvers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants