Skip to content

Commit 20b6496

Browse files
kahyunnammurphymatt
authored andcommitted
fix: arch 12.1 -> "sm120a" flag for Spark, CUDA 12.9 (#2839)
<!-- .github/pull_request_template.md --> ## 📌 Description Bug found in nightly [Spark, 12.9] matrix https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/285092631, where Spark compiles to "120a" (see "/tmp/.cache/flashinfer/0.6.6/120a/" path in log below). ``` E RuntimeError: Check failed: (status == cudaSuccess) is false: SingleDecodeWithKVCache kernel launch failed, error: no kernel image is available for execution on the device /tmp/.cache/flashinfer/0.6.6/120a/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_2_use_swa_False_use_logits_cap_False/single_decode.cu:100: RuntimeError: Check failed: (status == cudaSuccess) is false: SingleDecodeWithKVCache kernel launch failed, error: no kernel image is available for execution on the device ``` Root cause was flashinfer-ai/flashinfer#2725 , where we added logic for compiling both Spark and Thor to 120f, but on the condition that cuda version is 13 or higher. Lower (12.9) defaults to 'a' suffix, 120a. ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Strengthened CUDA validation for SM 12.x GPUs: now requires CUDA 12.9 or newer and emits a clear error if unmet, replacing the previous silent fallback behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 34263f1 commit 20b6496

1 file changed

Lines changed: 7 additions & 15 deletions

File tree

flashinfer/compilation_context.py

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -36,28 +36,20 @@ def _normalize_cuda_arch(major: int, minor: int) -> tuple[int, str]:
3636
tuple with the correct architecture suffix for nvcc.
3737
3838
SM 9.x -> 'a' suffix (e.g. compute_90a)
39-
SM 12.x -> always normalized to SM 120 with 'f' suffix (e.g. compute_120f)
40-
when the installed CUDA toolchain supports it (CUDA >= 13.0),
41-
otherwise 'a'. This covers both SM 12.0 and SM 12.1 (DGX Spark).
39+
SM 12.x -> always normalized to SM 120 with 'f' suffix (e.g. compute_120f).
40+
This covers both SM 12.0 and SM 12.1 (DGX Spark) when the installed CUDA toolchain supports it (CUDA >= 12.9).
4241
SM 10+ -> 'a' suffix (e.g. compute_100a)
4342
SM < 9 -> no suffix
4443
"""
4544
if major == 9:
4645
return (major, str(minor) + "a")
4746
elif major == 12:
48-
try:
49-
from flashinfer.jit.cpp_ext import is_cuda_version_at_least
47+
from flashinfer.jit.cpp_ext import is_cuda_version_at_least
5048

51-
if is_cuda_version_at_least("13.0"):
52-
return (major, "0f")
53-
except (ImportError, RuntimeError, ValueError):
54-
logger.debug(
55-
"Could not determine CUDA version; "
56-
"falling back to 'a' suffix for SM %d.%d",
57-
major,
58-
minor,
59-
)
60-
return (major, "0a")
49+
if is_cuda_version_at_least("12.9"):
50+
return (major, "0f")
51+
else:
52+
raise RuntimeError("SM 12.x requires CUDA >= 12.9")
6153
elif major >= 10:
6254
return (major, str(minor) + "a")
6355
return (major, str(minor))

0 commit comments

Comments
 (0)