-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Compile] accelerate compilation speed using NVRTC #18519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Environment Variables | ||
| --------------------- | ||
| TVM_CUDA_COMPILE_MODE : str | ||
| Compiler backend: "nvcc" (default) or "nvrtc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not default to nvrtc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should cross check the speed diff and once confirmed, we can switch to nvrtc default
python/tvm/contrib/nvcc.py
Outdated
| from cuda.bindings import nvrtc # pylint: disable=import-outside-toplevel | ||
| except ImportError as e: | ||
| raise RuntimeError( | ||
| "cuda-python is not available. Install with: pip install cuda-python\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say "fail to compile CUDA with nvrtc, because ..." at the beginning, so that it's clear that the failure happens when compiling cuda code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
This PR supports NVRTC as an alternative to NVCC for faster, device-side JIT compilation of CUDA kernels, in favor of the PR apache/tvm-ffi#283.
It enhances the CUDA compilation backend by:
Users can choose the compilation backend using an environment variable
TVM_CUDA_COMPILE_MODE, choosing from "nvcc" and "nvrtc". For example,TVM_CUDA_COMPILE_MODE=nvrtc python3 your_program.pyHere is a short benchmark of the compilation speed of kernels in
test_target_codegen_cuda.py.NVCC vs NVRTC Compilation Time Comparison (Python-side Call)
test_crossthread_reduction1test_cuda_bf16_vectorize_addtest_cuda_const_float_to_halftest_cuda_device_func_calltest_cuda_float_const_hex_formattest_cuda_floordiv_with_vectorizationtest_cuda_inf_nantest_cuda_tensormaptest_cuda_thread_sync_inside_conditiontest_cuda_vectorize_addtest_cuda_vectorize_loadtest_device_host_call_same_functest_vectorized_intrin1NVSHMEM Support
Currently, NVSHMEM is not supported via NVRTC.
TVM_CUDA_COMPILE_MODEis set to nvrtc.