Add PrecompileTools workload for improved startup time #389
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
src/precompilation.jlwith@compile_workloadblockdiffeqgpunormutility function for Float32, Float64, and ForwardDiff.Dual typesmake_prob_compatiblefunctionBenchmark Results
Load Time
TTFX (Time to First Execution)
After precompilation, first calls to algorithm constructors and utility functions are in microseconds:
GPUTsit5(): ~33μsGPUVern7(): ~26μsEnsembleCPUArray(): ~44μsdiffeqgpunorm(x, t): ~62μsInvalidation Analysis
Checked for invalidations using SnoopCompile. Found 100 invalidation trees, but none originating from DiffEqGPU itself - all are from dependencies (ChainRulesCore, StaticArrays, SpecialFunctions, etc.). No action needed from DiffEqGPU's side.
Test Plan
cc @ChrisRackauckas
🤖 Generated with Claude Code