Add GPU benchmarking infrastructure with DiffEqGPU proof-of-concept #1440

divital-coder · 2026-01-01T02:32:22Z

![NOTE] : Claude-code opus 4.5 with Github MCP was used to refine and open this PR.

Summary

This PR adds GPU benchmarking support to SciMLBenchmarks.jl:

Add run_gpu_benchmark.yml template using juliagpu queue with CUDA agents
Modify launch_benchmarks.yml to route GPU benchmarks to GPU-specific pipeline
Update build_benchmark.sh with GPU environment setup and CUDA verification
Create benchmarks/GPU/ directory with DiffEqGPU Lorenz ensemble benchmark

Changes

CI Configuration

.buildkite/run_gpu_benchmark.yml: New GPU benchmark template using queue: "juliagpu" with cuda: "*" agents (no sandbox plugin due to GPU passthrough complexity)
.buildkite/launch_benchmarks.yml: Added GPU watch block for benchmarks/GPU/** paths, excluded GPU from CPU launcher
.buildkite/build_benchmark.sh: Added GPU setup (JULIA_CUDA_MEMORY_POOL='none') and CUDA verification step

Proof-of-Concept Benchmark

benchmarks/GPU/Project.toml: Dependencies (CUDA.jl, DiffEqGPU.jl, OrdinaryDiffEq.jl, StaticArrays.jl)
benchmarks/GPU/EnsembleGPU_Lorenz.jmd: Lorenz system ensemble comparing CPU (EnsembleThreads) vs GPU (EnsembleGPUKernel) across 100-100000 trajectories

Test plan

Verify GPU benchmark triggers on juliagpu queue agents
Verify CPU benchmarks still use juliaecosystem queue
Verify CUDA verification step runs before GPU benchmarks
Verify Lorenz ensemble benchmark produces speedup results

This adds GPU benchmarking support to SciMLBenchmarks.jl: CI Configuration: - Add run_gpu_benchmark.yml template for GPU jobs using juliagpu queue - Modify launch_benchmarks.yml to separate CPU and GPU benchmark triggers - Update build_benchmark.sh with GPU environment setup and CUDA verification Proof-of-Concept Benchmark: - Add benchmarks/GPU/ directory with DiffEqGPU Lorenz ensemble benchmark - Compare CPU (EnsembleThreads) vs GPU (EnsembleGPUKernel) performance - Benchmark across 100 to 100,000 trajectories GPU benchmarks use: - queue: "juliagpu" with cuda: "*" agent tags - No sandbox plugin (GPU passthrough limitation) - JULIA_CUDA_MEMORY_POOL='none' for accurate benchmarking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder · 2026-01-02T15:29:47Z

@thazhemadam Unblock?

ChrisRackauckas · 2026-01-03T10:42:49Z

I unblocked

divital-coder · 2026-01-03T12:06:30Z

Thank you @ChrisRackauckas

The UUID was incorrect, causing package resolution to fail. Correct UUID: 071ae1c0-96b5-11e9-1965-c90190d839ea 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder · 2026-01-03T12:22:27Z

Do each of the commits require unblocking? I thought unblocking would unblock workflows for a specific PR!

nvm, unblock again perhaps ?

ChrisRackauckas · 2026-01-03T12:29:45Z

yes each requires unblocking for this since it's changing the CI infrastructure

- Replace @Assert with graceful CUDA_AVAILABLE check - Skip GPU benchmarks if CUDA.functional() returns false - Show CPU-only results when GPU unavailable - Prevent cascading errors from failed GPU initialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder · 2026-01-03T15:44:50Z

unblock :)

ChrisRackauckas · 2026-01-03T21:48:00Z

I did

divital-coder · 2026-01-04T06:54:29Z

Does the JuliaGPU build queue on buildkite also sometimes run an agent without an available GPU? it keeps running on CPU still, due to cuda not being present.

ChrisRackauckas · 2026-01-04T06:56:55Z

yes, it needs to use a queue with gpus

The ignore pattern `benchmarks/GPU/**` only matches files inside the GPU directory, not the directory path itself. When the path processor outputs `benchmarks/GPU` (the project directory), it wasn't being matched by the ignore pattern, causing GPU benchmarks to be picked up by the CPU launcher. Added `benchmarks/GPU` to the ignore list so both the directory name and its contents are properly ignored by the CPU launcher, ensuring GPU benchmarks only run via run_gpu_benchmark.yml on the juliagpu queue. Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder · 2026-01-10T13:49:03Z

unblocking required :)

… queue The juliagpu queue agents (managed by JuliaGPU/buildkite) have these tags: - queue=juliagpu - cuda (with version or *) - cap (compute capability) - exclusive=true - gpu (GPU model) They do NOT have an `arch` tag. Requiring `arch: "x86_64"` caused jobs to wait indefinitely (8+ hours) because no agent matched all requirements. Verified against JuliaGPU/buildkite agent configurations (e.g., gpuci.17). Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder · 2026-01-10T21:52:34Z

Hello, can someone unblock :)

ChrisRackauckas · 2026-01-11T00:35:52Z

Unblocked

…g config The `exclusive: true` tag was causing GPU benchmark jobs to wait indefinitely (19+ hours) for an agent. Analysis of working SciML GPU CI configurations (DiffEqGPU.jl) shows they do NOT use the exclusive tag. The JuliaGPU buildkite agents already provide de-facto exclusivity via the `--disconnect-after-job` flag - each agent runs one job then disconnects. The explicit `exclusive: true` requirement is redundant and causes matching issues. New configuration matches DiffEqGPU.jl: agents: queue: "juliagpu" cuda: "*" Sources: - DiffEqGPU.jl: github.com/SciML/DiffEqGPU.jl/blob/master/.buildkite/runtests.yml - JuliaGPU agents: github.com/JuliaGPU/buildkite/blob/main/agents/ Co-Authored-By: Claude Opus 4.5 <[email protected]>

divital-coder force-pushed the scimlbenchmarks-gpu-configuration branch from daac9d9 to 5cf4b53 Compare January 1, 2026 03:14

Fix DiffEqGPU UUID in GPU benchmark Project.toml

112d89d

The UUID was incorrect, causing package resolution to fail. Correct UUID: 071ae1c0-96b5-11e9-1965-c90190d839ea 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Uh oh!

Add GPU benchmarking infrastructure with DiffEqGPU proof-of-concept #1440

Are you sure you want to change the base?

Add GPU benchmarking infrastructure with DiffEqGPU proof-of-concept #1440

Uh oh!

Conversation

divital-coder commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

CI Configuration

Proof-of-Concept Benchmark

Test plan

Uh oh!

divital-coder commented Jan 2, 2026

Uh oh!

ChrisRackauckas commented Jan 3, 2026

Uh oh!

divital-coder commented Jan 3, 2026

Uh oh!

divital-coder commented Jan 3, 2026

Uh oh!

ChrisRackauckas commented Jan 3, 2026

Uh oh!

divital-coder commented Jan 3, 2026

Uh oh!

ChrisRackauckas commented Jan 3, 2026

Uh oh!

divital-coder commented Jan 4, 2026

Uh oh!

ChrisRackauckas commented Jan 4, 2026

Uh oh!

divital-coder commented Jan 10, 2026

Uh oh!

divital-coder commented Jan 10, 2026

Uh oh!

ChrisRackauckas commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

divital-coder commented Jan 1, 2026 •

edited

Loading