Skip to content

Add support for CUDA architecture family codes#27278

Merged
tianleiwu merged 1 commit intomicrosoft:mainfrom
mc-nv:mc-nv/TRI-624/update-arch-codes
Feb 8, 2026
Merged

Add support for CUDA architecture family codes#27278
tianleiwu merged 1 commit intomicrosoft:mainfrom
mc-nv:mc-nv/TRI-624/update-arch-codes

Conversation

@mc-nv
Copy link
Contributor

@mc-nv mc-nv commented Feb 7, 2026

This change extends CUDA architecture handling to support family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to Triton Inference Server repositories (backend and onnxruntime_backend).

Changes:

  1. Added CUDAARCHS environment variable support (standard CMake variable)

    • Allows users to override architecture list via environment variable
    • Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set
  2. Extended regex patterns to recognize family code suffix 'f'

    • Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
    • Preserves 'f' suffix during parsing phase
  3. Updated normalization logic to handle family codes

    • Family codes (ending with 'f') preserved without adding -real suffix
    • Traditional codes continue to receive -real or -a-real suffixes
    • Architecture-specific codes (with 'a') remain unchanged
  4. Extended architecture support lists

    • Added SM 110 to ARCHITECTURES_WITH_KERNELS
    • Added SM 110 to ARCHITECTURES_WITH_ACCEL

Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3, and future 10.x devices, using features common across the family.

Usage examples:

  • CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
  • cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" ..
  • python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f"

The implementation supports mixed formats in the same list:

  • Traditional: 75-real, 80-real, 90-real
  • Architecture-specific: 90a-real (CC 9.0 only)
  • Family-specific: 100f, 110f, 120f (entire family)

Note: Current defaults still use bare numbers (75;80;90;100;120) which normalize to architecture-specific codes with 'a' suffix. Users who want family-specific behavior should explicitly use the 'f' suffix via CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.

References:

Description

Motivation and Context

This change extends CUDA architecture handling to support family-specific
codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to
Triton Inference Server repositories (backend and onnxruntime_backend).

Changes:
1. Added CUDAARCHS environment variable support (standard CMake variable)
   - Allows users to override architecture list via environment variable
   - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set

2. Extended regex patterns to recognize family code suffix 'f'
   - Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
   - Preserves 'f' suffix during parsing phase

3. Updated normalization logic to handle family codes
   - Family codes (ending with 'f') preserved without adding -real suffix
   - Traditional codes continue to receive -real or -a-real suffixes
   - Architecture-specific codes (with 'a') remain unchanged

4. Extended architecture support lists
   - Added SM 110 to ARCHITECTURES_WITH_KERNELS
   - Added SM 110 to ARCHITECTURES_WITH_ACCEL

Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward
compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3,
and future 10.x devices, using features common across the family.

Usage examples:
- CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
- cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" ..
- python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f"

The implementation supports mixed formats in the same list:
- Traditional: 75-real, 80-real, 90-real
- Architecture-specific: 90a-real (CC 9.0 only)
- Family-specific: 100f, 110f, 120f (entire family)

Note: Current defaults still use bare numbers (75;80;90;100;120) which
normalize to architecture-specific codes with 'a' suffix. Users who want
family-specific behavior should explicitly use the 'f' suffix via
CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.

References:
- NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features:
  https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
- Triton Inference Server backend updates (commit f5e901f)
@mc-nv mc-nv force-pushed the mc-nv/TRI-624/update-arch-codes branch from 44f9e9f to 319e132 Compare February 7, 2026 02:14
@mc-nv
Copy link
Contributor Author

mc-nv commented Feb 7, 2026

cc: @chilo-ms

@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit 6625856 into microsoft:main Feb 8, 2026
88 checks passed
tianleiwu pushed a commit that referenced this pull request Feb 12, 2026
This change extends CUDA architecture handling to support
family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning
with updates made to Triton Inference Server repositories (backend and
onnxruntime_backend).

Changes:
1. Added CUDAARCHS environment variable support (standard CMake
variable)
   - Allows users to override architecture list via environment variable
   - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set

2. Extended regex patterns to recognize family code suffix 'f'
- Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
   - Preserves 'f' suffix during parsing phase

3. Updated normalization logic to handle family codes
- Family codes (ending with 'f') preserved without adding -real suffix
   - Traditional codes continue to receive -real or -a-real suffixes
   - Architecture-specific codes (with 'a') remain unchanged

4. Extended architecture support lists
   - Added SM 110 to ARCHITECTURES_WITH_KERNELS
   - Added SM 110 to ARCHITECTURES_WITH_ACCEL

Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward
compatibility within a GPU family. For example, 100f runs on CC 10.0,
10.3, and future 10.x devices, using features common across the family.

Usage examples:
- CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
- cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f"
..
- python build.py --cmake_extra_defines
CMAKE_CUDA_ARCHITECTURES="100f;110f"

The implementation supports mixed formats in the same list:
- Traditional: 75-real, 80-real, 90-real
- Architecture-specific: 90a-real (CC 9.0 only)
- Family-specific: 100f, 110f, 120f (entire family)

Note: Current defaults still use bare numbers (75;80;90;100;120) which
normalize to architecture-specific codes with 'a' suffix. Users who want
family-specific behavior should explicitly use the 'f' suffix via
CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.

References:
- NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
- Triton Inference Server backend updates (commit f5e901f)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
tianleiwu added a commit that referenced this pull request Feb 13, 2026
This cherry-picks the following commits for the 1.24.2 release:
- #27096
- #27077
- #26677
- #27238
- #27213
- #27256
- #27278
- #27275
- #27276
- #27216
- #27271
- #27299
- #27294
- #27266
- #27176
- #27126
- #27252

---------

Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: qti-monumeen <monumeen@qti.qualcomm.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: guschmue <22941064+guschmue@users.noreply.github.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: angelser <32746004+angelser@users.noreply.github.com>
Co-authored-by: Angela Serrano Brummett <angelser@microsoft.com>
Co-authored-by: Misha Chornyi <99709299+mc-nv@users.noreply.github.com>
Co-authored-by: hariharans29 <9969784+hariharans29@users.noreply.github.com>
Co-authored-by: eserscor <erscor@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com>
Co-authored-by: bmehta001 <bmehta001@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants