Skip to content

Move the CUDA driver inspect script to a C binary#13116

Open
JamesWrigley wants to merge 2 commits intoJuliaPackaging:masterfrom
JamesWrigley:cuda-driver-check
Open

Move the CUDA driver inspect script to a C binary#13116
JamesWrigley wants to merge 2 commits intoJuliaPackaging:masterfrom
JamesWrigley:cuda-driver-check

Conversation

@JamesWrigley
Copy link
Contributor

This reduces the load time for CUDA_Driver_jll from ~0.45s to ~0.08s on my system. When used with CUDA.jl:

# Before
$ hyperfine -w 2 -r 10 "julia --startup-file=no --project -e 'import CUDA'"                                                                                                                                              
Benchmark 1: julia --startup-file=no --project -e 'import CUDA'                                                                                                                                                                                
  Time (mean ± σ):      3.472 s ±  0.050 s    [User: 4.553 s, System: 0.346 s]                                                                                                                                                                 
  Range (min … max):    3.402 s …  3.569 s    10 runs

# After
$ hyperfine -w 2 -r 10 "julia --startup-file=no --project -e 'import CUDA'"
Benchmark 1: julia --startup-file=no --project -e 'import CUDA'
  Time (mean ± σ):      3.037 s ±  0.019 s    [User: 3.310 s, System: 0.210 s]
  Range (min … max):    3.007 s …  3.066 s    10 runs

The C code was mostly written by Claude 🤖 For reviewers, I've kept the commits atomic so I'd recommend reviewing them one-by-one.

Warning: unfortunately I don't have a GPU node that I can run BinaryBuilder on (because of permissions issues) so I've only tested this on machines without a GPU. Someone with a GPU should probably try it out.

@JamesWrigley
Copy link
Contributor Author

Ok I gotta fix the inspect recipe on windows and musl. And the CUDA_Driver_jll builds are failing because the new dependency isn't registered, so that needs to go in another PR.

@giordano
Copy link
Member

unfortunately I don't have a GPU node that I can run BinaryBuilder on (because of permissions issues) so I've only tested this on machines without a GPU. Someone with a GPU should probably try it out.

You can publish it to a github repo, and the install it as a regular package: https://docs.binarybuilder.org/stable/reference/#Command-Line

This is much faster than calling the Julia script.
@JamesWrigley JamesWrigley marked this pull request as draft February 12, 2026 09:11
@JamesWrigley
Copy link
Contributor Author

Marking as draft for now because I'm somehow not able to reproduce the TTFX improvements from the precompile statements after moving the C source into CUDA_Driver.

@JamesWrigley
Copy link
Contributor Author

JamesWrigley commented Feb 12, 2026

Hmm the issue is that the precompile statements have to be at the top-level. Is there a way to configure BinaryBuilder to do that?

@JamesWrigley
Copy link
Contributor Author

Hacked around in eb9884a. I had to set julia_compat so that BinaryBuilder won't default to using LazyArtifacts, which adds another ~300ms. I think I could also set lazy_artifacts=false directly but I don't know if it's safe.

@JamesWrigley JamesWrigley marked this pull request as ready for review February 12, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants