-
-
Notifications
You must be signed in to change notification settings - Fork 105
Add GPU benchmarking infrastructure with DiffEqGPU proof-of-concept #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add GPU benchmarking infrastructure with DiffEqGPU proof-of-concept #1440
Conversation
This adds GPU benchmarking support to SciMLBenchmarks.jl: CI Configuration: - Add run_gpu_benchmark.yml template for GPU jobs using juliagpu queue - Modify launch_benchmarks.yml to separate CPU and GPU benchmark triggers - Update build_benchmark.sh with GPU environment setup and CUDA verification Proof-of-Concept Benchmark: - Add benchmarks/GPU/ directory with DiffEqGPU Lorenz ensemble benchmark - Compare CPU (EnsembleThreads) vs GPU (EnsembleGPUKernel) performance - Benchmark across 100 to 100,000 trajectories GPU benchmarks use: - queue: "juliagpu" with cuda: "*" agent tags - No sandbox plugin (GPU passthrough limitation) - JULIA_CUDA_MEMORY_POOL='none' for accurate benchmarking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
daac9d9 to
5cf4b53
Compare
|
@thazhemadam Unblock? |
|
I unblocked |
|
Thank you @ChrisRackauckas |
The UUID was incorrect, causing package resolution to fail. Correct UUID: 071ae1c0-96b5-11e9-1965-c90190d839ea 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Do each of the commits require unblocking? I thought unblocking would unblock workflows for a specific PR! nvm, unblock again perhaps ? |
|
yes each requires unblocking for this since it's changing the CI infrastructure |
- Replace @Assert with graceful CUDA_AVAILABLE check - Skip GPU benchmarks if CUDA.functional() returns false - Show CPU-only results when GPU unavailable - Prevent cascading errors from failed GPU initialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
unblock :) |
|
I did |
|
Does the JuliaGPU build queue on buildkite also sometimes run an agent without an available GPU? it keeps running on CPU still, due to cuda not being present. |
|
yes, it needs to use a queue with gpus |
The ignore pattern `benchmarks/GPU/**` only matches files inside the GPU directory, not the directory path itself. When the path processor outputs `benchmarks/GPU` (the project directory), it wasn't being matched by the ignore pattern, causing GPU benchmarks to be picked up by the CPU launcher. Added `benchmarks/GPU` to the ignore list so both the directory name and its contents are properly ignored by the CPU launcher, ensuring GPU benchmarks only run via run_gpu_benchmark.yml on the juliagpu queue. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
unblocking required :) |
… queue The juliagpu queue agents (managed by JuliaGPU/buildkite) have these tags: - queue=juliagpu - cuda (with version or *) - cap (compute capability) - exclusive=true - gpu (GPU model) They do NOT have an `arch` tag. Requiring `arch: "x86_64"` caused jobs to wait indefinitely (8+ hours) because no agent matched all requirements. Verified against JuliaGPU/buildkite agent configurations (e.g., gpuci.17). Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Hello, can someone unblock :) |
|
Unblocked |
…g config
The `exclusive: true` tag was causing GPU benchmark jobs to wait indefinitely
(19+ hours) for an agent. Analysis of working SciML GPU CI configurations
(DiffEqGPU.jl) shows they do NOT use the exclusive tag.
The JuliaGPU buildkite agents already provide de-facto exclusivity via the
`--disconnect-after-job` flag - each agent runs one job then disconnects.
The explicit `exclusive: true` requirement is redundant and causes matching
issues.
New configuration matches DiffEqGPU.jl:
agents:
queue: "juliagpu"
cuda: "*"
Sources:
- DiffEqGPU.jl: github.com/SciML/DiffEqGPU.jl/blob/master/.buildkite/runtests.yml
- JuliaGPU agents: github.com/JuliaGPU/buildkite/blob/main/agents/
Co-Authored-By: Claude Opus 4.5 <[email protected]>
![NOTE] : Claude-code opus 4.5 with Github MCP was used to refine and open this PR.
Summary
This PR adds GPU benchmarking support to SciMLBenchmarks.jl:
run_gpu_benchmark.ymltemplate usingjuliagpuqueue with CUDA agentslaunch_benchmarks.ymlto route GPU benchmarks to GPU-specific pipelinebuild_benchmark.shwith GPU environment setup and CUDA verificationbenchmarks/GPU/directory with DiffEqGPU Lorenz ensemble benchmarkChanges
CI Configuration
.buildkite/run_gpu_benchmark.yml: New GPU benchmark template usingqueue: "juliagpu"withcuda: "*"agents (no sandbox plugin due to GPU passthrough complexity).buildkite/launch_benchmarks.yml: Added GPU watch block forbenchmarks/GPU/**paths, excluded GPU from CPU launcher.buildkite/build_benchmark.sh: Added GPU setup (JULIA_CUDA_MEMORY_POOL='none') and CUDA verification stepProof-of-Concept Benchmark
benchmarks/GPU/Project.toml: Dependencies (CUDA.jl, DiffEqGPU.jl, OrdinaryDiffEq.jl, StaticArrays.jl)benchmarks/GPU/EnsembleGPU_Lorenz.jmd: Lorenz system ensemble comparing CPU (EnsembleThreads) vs GPU (EnsembleGPUKernel) across 100-100000 trajectoriesTest plan
juliagpuqueue agentsjuliaecosystemqueue