[Compile] accelerate compilation speed using NVRTC #18519

Kathryn-cat · 2025-11-27T19:45:47Z

This PR supports NVRTC as an alternative to NVCC for faster, device-side JIT compilation of CUDA kernels, in favor of the PR apache/tvm-ffi#283.

It enhances the CUDA compilation backend by:

Adding Python NVRTC support using cuda-python bindings
Removing legacy C++ NVRTC fallback in favor of a Python-first approach
Keeping nvcc as the default compiler with fatbin output (no behavior change for existing users)

Users can choose the compilation backend using an environment variable TVM_CUDA_COMPILE_MODE, choosing from "nvcc" and "nvrtc". For example,

TVM_CUDA_COMPILE_MODE=nvrtc python3 your_program.py

Here is a short benchmark of the compilation speed of kernels in test_target_codegen_cuda.py.

NVCC vs NVRTC Compilation Time Comparison (Python-side Call)

Test Case	Code Size	NVCC Time (ms)	NVRTC Time (ms)	Speedup
`test_crossthread_reduction1`	1945 B	241.27	51.23	4.7x
`test_cuda_bf16_vectorize_add`	3760 B	342.72	44.50	7.7x
`test_cuda_const_float_to_half`	12394 B	272.85	31.99	8.5x
`test_cuda_device_func_call`	975 B	215.58	21.47	10.0x
`test_cuda_float_const_hex_format`	685 B	217.39	20.52	10.6x
`test_cuda_floordiv_with_vectorization`	1050 B	213.88	23.32	9.2x
`test_cuda_inf_nan`	673 B	214.33	24.94	8.6x
`test_cuda_tensormap`	755 B	213.91	20.74	10.3x
`test_cuda_thread_sync_inside_condition`	1007 B	213.43	28.29	7.5x
`test_cuda_vectorize_add`	908 B	226.81	40.39	5.6x
`test_cuda_vectorize_load`	734 B	217.25	24.02	9.0x
`test_device_host_call_same_func`	924 B	216.03	21.21	10.2x
`test_vectorized_intrin1`	847 B	226.15	26.34	8.6x

NVSHMEM Support

Currently, NVSHMEM is not supported via NVRTC.

Fallback Behavior: When NVSHMEM is required, the compilation pipeline will automatically fall back to NVCC, even if TVM_CUDA_COMPILE_MODE is set to nvrtc.
Future Roadmap: Support for NVRTC with NVSHMEM is planned for follow-up PRs.

yzh119 · 2025-12-05T08:29:06Z

python/tvm/contrib/nvcc.py

+    Environment Variables
+    ---------------------
+    TVM_CUDA_COMPILE_MODE : str
+        Compiler backend: "nvcc" (default) or "nvrtc"


why not default to nvrtc?

I think we should cross check the speed diff and once confirmed, we can switch to nvrtc default

MasterJH5574 · 2025-12-12T16:20:41Z

python/tvm/contrib/nvcc.py

+        from cuda.bindings import nvrtc  # pylint: disable=import-outside-toplevel
+    except ImportError as e:
+        raise RuntimeError(
+            "cuda-python is not available. Install with: pip install cuda-python\n"


Maybe say "fail to compile CUDA with nvrtc, because ..." at the beginning, so that it's clear that the failure happens when compiling cuda code?

thanks, updated

Kathryn-cat added 13 commits November 25, 2025 18:30

init

d63ef67

upd

882f7bd

upd

c4514c4

upd

7e76a59

upd

9df7435

upd

e8d1657

addressed segfault

5601ddb

remove host-side deps; unit test passed except CUtensorMap

9040d59

fixed int8 tests

568412e

CUtensorMap patch

d10e9fe

dual compilation problem fixed

a498b1f

TVM_CUDA_COMPILE_MODE

5f905e2

unit tests

925022f

Kathryn-cat changed the title ~~wip: nvrtc~~ [Compile] accelerate compilation speed using NVRTC Nov 29, 2025

Kathryn-cat marked this pull request as ready for review November 29, 2025 00:45

Kathryn-cat added 9 commits November 29, 2025 19:03

remove deps in cmake

073f51e

update call site

4008da0

gpu ci env

7a28348

lint

734bf71

skip test if cuda-python is not available

9c36b0b

robustify CUDA header files search

c598ba3

fix CI

c8969ec

fixed nvshmem

6158357

nvrtc nvshmem compile

fe4780e

yzh119 reviewed Dec 5, 2025

View reviewed changes

Kathryn-cat added 2 commits December 12, 2025 08:09

remove nvshmem tests

7856b5c

fall back to nvcc for nvshmem

143707e

MasterJH5574 reviewed Dec 12, 2025

View reviewed changes

Kathryn-cat added 2 commits December 12, 2025 08:35

update error message

2bcbc03

lint

b7fb6ca

Kathryn-cat added 5 commits December 12, 2025 09:30

lint

e5a9c0e

lint

92514c1

lint

94f1f56

lint

4b11de2

lint

4b14e38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Compile] accelerate compilation speed using NVRTC #18519

[Compile] accelerate compilation speed using NVRTC #18519

Kathryn-cat commented Nov 27, 2025 •

edited

Loading

Uh oh!

yzh119 Dec 5, 2025

Uh oh!

tqchen Dec 5, 2025

Uh oh!

MasterJH5574 Dec 12, 2025

Uh oh!

Kathryn-cat Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Compile] accelerate compilation speed using NVRTC #18519

Are you sure you want to change the base?

[Compile] accelerate compilation speed using NVRTC #18519

Conversation

Kathryn-cat commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NVCC vs NVRTC Compilation Time Comparison (Python-side Call)

NVSHMEM Support

Uh oh!

yzh119 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

tqchen Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

MasterJH5574 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Kathryn-cat Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kathryn-cat commented Nov 27, 2025 •

edited

Loading