Commit f040aac
Add support for CUDA architecture family codes (#27278)
This change extends CUDA architecture handling to support
family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning
with updates made to Triton Inference Server repositories (backend and
onnxruntime_backend).
Changes:
1. Added CUDAARCHS environment variable support (standard CMake
variable)
- Allows users to override architecture list via environment variable
- Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set
2. Extended regex patterns to recognize family code suffix 'f'
- Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
- Preserves 'f' suffix during parsing phase
3. Updated normalization logic to handle family codes
- Family codes (ending with 'f') preserved without adding -real suffix
- Traditional codes continue to receive -real or -a-real suffixes
- Architecture-specific codes (with 'a') remain unchanged
4. Extended architecture support lists
- Added SM 110 to ARCHITECTURES_WITH_KERNELS
- Added SM 110 to ARCHITECTURES_WITH_ACCEL
Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward
compatibility within a GPU family. For example, 100f runs on CC 10.0,
10.3, and future 10.x devices, using features common across the family.
Usage examples:
- CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
- cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f"
..
- python build.py --cmake_extra_defines
CMAKE_CUDA_ARCHITECTURES="100f;110f"
The implementation supports mixed formats in the same list:
- Traditional: 75-real, 80-real, 90-real
- Architecture-specific: 90a-real (CC 9.0 only)
- Family-specific: 100f, 110f, 120f (entire family)
Note: Current defaults still use bare numbers (75;80;90;100;120) which
normalize to architecture-specific codes with 'a' suffix. Users who want
family-specific behavior should explicitly use the 'f' suffix via
CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.
References:
- NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
- Triton Inference Server backend updates (commit f5e901f)
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->1 parent a21298f commit f040aac
1 file changed
+15
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
88 | 93 | | |
89 | 94 | | |
90 | 95 | | |
| |||
139 | 144 | | |
140 | 145 | | |
141 | 146 | | |
142 | | - | |
| 147 | + | |
143 | 148 | | |
144 | | - | |
145 | | - | |
146 | | - | |
| 149 | + | |
147 | 150 | | |
| 151 | + | |
| 152 | + | |
148 | 153 | | |
149 | 154 | | |
150 | 155 | | |
| |||
156 | 161 | | |
157 | 162 | | |
158 | 163 | | |
159 | | - | |
| 164 | + | |
160 | 165 | | |
161 | 166 | | |
162 | 167 | | |
| |||
165 | 170 | | |
166 | 171 | | |
167 | 172 | | |
168 | | - | |
| 173 | + | |
169 | 174 | | |
170 | 175 | | |
171 | | - | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
| |||
0 commit comments