Add support for CUDA architecture family codes by mc-nv · Pull Request #27278 · microsoft/onnxruntime

mc-nv · 2026-02-07T01:12:38Z

This change extends CUDA architecture handling to support family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to Triton Inference Server repositories (backend and onnxruntime_backend).

Changes:

Added CUDAARCHS environment variable support (standard CMake variable)
- Allows users to override architecture list via environment variable
- Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set
Extended regex patterns to recognize family code suffix 'f'
- Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families
- Preserves 'f' suffix during parsing phase
Updated normalization logic to handle family codes
- Family codes (ending with 'f') preserved without adding -real suffix
- Traditional codes continue to receive -real or -a-real suffixes
- Architecture-specific codes (with 'a') remain unchanged
Extended architecture support lists
- Added SM 110 to ARCHITECTURES_WITH_KERNELS
- Added SM 110 to ARCHITECTURES_WITH_ACCEL

Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3, and future 10.x devices, using features common across the family.

Usage examples:

CUDAARCHS="75;80;90;100f;110f;120f" cmake ..
cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" ..
python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f"

The implementation supports mixed formats in the same list:

Traditional: 75-real, 80-real, 90-real
Architecture-specific: 90a-real (CC 9.0 only)
Family-specific: 100f, 110f, 120f (entire family)

Note: Current defaults still use bare numbers (75;80;90;100;120) which normalize to architecture-specific codes with 'a' suffix. Users who want family-specific behavior should explicitly use the 'f' suffix via CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES.

References:

NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
Triton Inference Server backend updates (commit f5e901f)

Description

Motivation and Context

This change extends CUDA architecture handling to support family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to Triton Inference Server repositories (backend and onnxruntime_backend). Changes: 1. Added CUDAARCHS environment variable support (standard CMake variable) - Allows users to override architecture list via environment variable - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set 2. Extended regex patterns to recognize family code suffix 'f' - Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families - Preserves 'f' suffix during parsing phase 3. Updated normalization logic to handle family codes - Family codes (ending with 'f') preserved without adding -real suffix - Traditional codes continue to receive -real or -a-real suffixes - Architecture-specific codes (with 'a') remain unchanged 4. Extended architecture support lists - Added SM 110 to ARCHITECTURES_WITH_KERNELS - Added SM 110 to ARCHITECTURES_WITH_ACCEL Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3, and future 10.x devices, using features common across the family. Usage examples: - CUDAARCHS="75;80;90;100f;110f;120f" cmake .. - cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" .. - python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f" The implementation supports mixed formats in the same list: - Traditional: 75-real, 80-real, 90-real - Architecture-specific: 90a-real (CC 9.0 only) - Family-specific: 100f, 110f, 120f (entire family) Note: Current defaults still use bare numbers (75;80;90;100;120) which normalize to architecture-specific codes with 'a' suffix. Users who want family-specific behavior should explicitly use the 'f' suffix via CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES. References: - NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/ - Triton Inference Server backend updates (commit f5e901f)

mc-nv · 2026-02-07T03:47:54Z

cc: @chilo-ms

tianleiwu · 2026-02-07T05:03:35Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-02-07T05:03:52Z

Azure Pipelines successfully started running 4 pipeline(s).

This change extends CUDA architecture handling to support family-specific codes (suffix 'f') introduced in CUDA 12.9, aligning with updates made to Triton Inference Server repositories (backend and onnxruntime_backend). Changes: 1. Added CUDAARCHS environment variable support (standard CMake variable) - Allows users to override architecture list via environment variable - Takes precedence when CMAKE_CUDA_ARCHITECTURES is not set 2. Extended regex patterns to recognize family code suffix 'f' - Supports codes like 100f, 110f, 120f for CC 10.x, 11.x, 12.x families - Preserves 'f' suffix during parsing phase 3. Updated normalization logic to handle family codes - Family codes (ending with 'f') preserved without adding -real suffix - Traditional codes continue to receive -real or -a-real suffixes - Architecture-specific codes (with 'a') remain unchanged 4. Extended architecture support lists - Added SM 110 to ARCHITECTURES_WITH_KERNELS - Added SM 110 to ARCHITECTURES_WITH_ACCEL Family-specific codes (introduced in CUDA 12.9/Blackwell) enable forward compatibility within a GPU family. For example, 100f runs on CC 10.0, 10.3, and future 10.x devices, using features common across the family. Usage examples: - CUDAARCHS="75;80;90;100f;110f;120f" cmake .. - cmake -DCMAKE_CUDA_ARCHITECTURES="75-real;80-real;90-real;100f;120f" .. - python build.py --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="100f;110f" The implementation supports mixed formats in the same list: - Traditional: 75-real, 80-real, 90-real - Architecture-specific: 90a-real (CC 9.0 only) - Family-specific: 100f, 110f, 120f (entire family) Note: Current defaults still use bare numbers (75;80;90;100;120) which normalize to architecture-specific codes with 'a' suffix. Users who want family-specific behavior should explicitly use the 'f' suffix via CUDAARCHS environment variable or CMAKE_CUDA_ARCHITECTURES. References: - NVIDIA Blackwell and CUDA 12.9 Family-Specific Architecture Features: https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/ - Triton Inference Server backend updates (commit f5e901f) ### Description  ### Motivation and Context

This cherry-picks the following commits for the 1.24.2 release: - #27096 - #27077 - #26677 - #27238 - #27213 - #27256 - #27278 - #27275 - #27276 - #27216 - #27271 - #27299 - #27294 - #27266 - #27176 - #27126 - #27252 --------- Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: qti-monumeen <monumeen@qti.qualcomm.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: guschmue <22941064+guschmue@users.noreply.github.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: angelser <32746004+angelser@users.noreply.github.com> Co-authored-by: Angela Serrano Brummett <angelser@microsoft.com> Co-authored-by: Misha Chornyi <99709299+mc-nv@users.noreply.github.com> Co-authored-by: hariharans29 <9969784+hariharans29@users.noreply.github.com> Co-authored-by: eserscor <erscor@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com> Co-authored-by: bmehta001 <bmehta001@users.noreply.github.com>

mc-nv force-pushed the mc-nv/TRI-624/update-arch-codes branch from 44f9e9f to 319e132 Compare February 7, 2026 02:14

mc-nv mentioned this pull request Feb 7, 2026

Address CUDA arch codes. Update gen_ort_dockerfil.py. triton-inference-server/onnxruntime_backend#334

Merged

dmitry-tokarev-nv approved these changes Feb 7, 2026

View reviewed changes

tianleiwu approved these changes Feb 8, 2026

View reviewed changes

tianleiwu merged commit 6625856 into microsoft:main Feb 8, 2026
88 checks passed

tianleiwu added the release:1.24.2 label Feb 12, 2026

tianleiwu mentioned this pull request Feb 12, 2026

ORT 1.24.2 release cherry pick round 1 #27330

Merged

tianleiwu removed the release:1.24.2 label Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CUDA architecture family codes#27278

Add support for CUDA architecture family codes#27278
tianleiwu merged 1 commit intomicrosoft:mainfrom
mc-nv:mc-nv/TRI-624/update-arch-codes

mc-nv commented Feb 7, 2026

Uh oh!

mc-nv commented Feb 7, 2026

Uh oh!

tianleiwu commented Feb 7, 2026

Uh oh!

azure-pipelines bot commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mc-nv commented Feb 7, 2026

Description

Motivation and Context

Uh oh!

mc-nv commented Feb 7, 2026

Uh oh!

tianleiwu commented Feb 7, 2026

Uh oh!

azure-pipelines bot commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants