fix: eliminate initialization_alg invalidations via invokelatest#855
Closed
ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
Closed
fix: eliminate initialization_alg invalidations via invokelatest#855ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
Conversation
…Base
Port the FunctionWrappersWrappers-based norecompile pattern from DiffEqBase
to NonlinearSolveBase. For standard problem types (Vector{Float64} state,
Vector{Float64} or NullParameters parameters), the problem function is
wrapped in a FunctionWrappersWrapper with precompiled type signatures for
both Float64 and ForwardDiff.Dual arguments, avoiding recompilation for
each unique user function type.
Key components:
- src/autospecialize.jl: NonlinearSolveTag, wrapfun_iip/oop base methods,
maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback
- ForwardDiff extension: dual-aware wrapfun dispatches with 6 type
combinations (Float64, Dual, NullParameters), tag standardization that
stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when
the function is wrapped
- solve.jl: maybe_wrap_f wired into get_concrete_problem for all problem
types (NonlinearProblem, NonlinearLeastSquaresProblem,
ImmutableNonlinearProblem), using EvalFunc wrapper for invokelatest
- jacobian.jl: standardize_forwarddiff_tag called in
construct_jacobian_cache so DI produces correctly-tagged duals
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ImmutableNonlinearProblem (used by SimpleNonlinearSolve) doesn't support Setfield reconstruction with wrapped function types. Skip wrapping since SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile pathway. Fixes CI adjoint test failure in SimpleNonlinearSolve. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FunctionWrapper wrapping cannot be automatically applied at solve time because multiple code paths (∂f/∂p, ∂f/∂u, bounds transform) call ForwardDiff directly with default chunk sizes, bypassing the standardized chunksize=1 path. This caused "No matching function wrapper found!" errors whenever ForwardDiff used chunksize > 1. The infrastructure (autospecialize.jl, extension wrappers, tag standardization) remains available for targeted use. Automatic wrapping requires coordinating ALL ForwardDiff call sites to use chunksize=1. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The standardize_forwarddiff_tag calls in autodiff.jl and jacobian.jl cause dual tag ordering errors when nested ForwardDiff is used (e.g., NLLS sensitivity + inner VJP). Remove these call sites and the unused maybe_wrap_f function since automatic wrapping is not yet active. The autospecialize infrastructure (NonlinearSolveTag, wrapfun_iip/oop, ForwardDiff extension wrappers) remains available for future activation when all direct ForwardDiff call sites are standardized. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Wire `maybe_wrap_f` into `get_concrete_problem` for NonlinearProblem and
NonlinearLeastSquaresProblem (IIP). Functions are wrapped in
`AutoSpecializeCallable{FW}` which holds a `FunctionWrappersWrapper` for
precompiled dispatch and the original function (type-erased as `Any`) for
try-catch fallback when dual tags mismatch (JVP paths, external packages).
Key changes:
- AutoSpecializeCallable uses `orig::Any` for type erasure (no EvalFunc)
- Skip OOP NLLS wrapping (return type may differ from u0)
- Standardize JVP/VJP autodiff tags in construct_jacobian_cache
- Replace AutoPolyesterForwardDiff with AutoForwardDiff{1,tag} when wrapped
- Use get_raw_f for nested ForwardDiff in NLLS VJP generation
- ForwardDiff sensitivity functions use chunksize=1 + tag when wrapped
Tests: core 727/0/0, wrapper 195/0/0, ForwardDiff 135636/0/0
OOP @inferred regresses (expected, same trade-off as DiffEqBase)
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot differentiate through FunctionWrapper internals (llvmcall). This adds: - ChainRulesCore rrule for AutoSpecializeCallable that redirects reverse-mode AD through the original unwrapped callable - _DISABLE_AUTOSPECIALIZE flag set in the solve_up rrule to prevent wrapping entirely during the adjoint code path - @test_broken for IIP @inferred (same wrapping-induced regression as OOP case) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…type - Remove task-local `_DISABLE_AUTOSPECIALIZE` flag entirely - Replace with `@set prob.f.f = get_raw_f(prob.f.f)` unwrapping in rrule - Remove parameter type restriction (any p works, mismatches fall back) - Add idempotency check to prevent double-wrapping - Remove `_DISABLE_AUTOSPECIALIZE` from public API Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
These are internal implementation details, not public API. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OOP wrapping requires guessing return types which doesn't always work. Only wrap IIP functions where the return type is always Nothing. IIP TTFX improvement (2nd/3rd function, same types): - NewtonRaphson: 2.2-2.5x faster - TrustRegion: 18x faster Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The existing workload used scalar p=2.0, which produces different
FunctionWrapper types than the common user case of Vector{Float64}
parameters. This caused the precompiled wrappers to miss the user path.
TTFX for IIP Vector{Float64} first solve: 2.7s → 1.0s
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The try-catch in AutoSpecializeCallable prevented inlining and added
~32 bytes per call, exceeding the 64-byte @ballocated budget in
NonlinearSolveFirstOrder, QuasiNewton, and SpectralMethods tests.
Replace with explicit dispatch methods for known argument types
(Vector{Float64}, Float64, NullParameters, and ForwardDiff duals),
routing to f.fw for zero-allocation calls. Unsupported types fall
back to f.orig via vararg dispatch. Also fix @test_broken -> @test
for @inferred solve(prob) which now passes.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Enzyme cannot differentiate through FunctionWrappers' llvmcall, causing
EnzymeMutabilityException in all IIP Vector{Float64} tests with AutoEnzyme.
Unwrap the function in construct_jacobian_cache when the AD backend is
Enzyme-based (including AutoSparse(AutoEnzyme(...))), so DI sees the raw
user function. Also apply Runic formatting to SCCNonlinearSolve files.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit only unwrapped for the concrete Jacobian path (DI.prepare_jacobian/DI.jacobian). This extends the fix to the JacobianOperator path used by Krylov solvers (GMRES, etc.) and backslash with concrete_jac=false. When Enzyme is used for JVP/VJP autodiff, create a modified problem with the raw user function so SciMLJacobianOperators' DI.pushforward!/ DI.pullback! calls don't go through FunctionWrappers' llvmcall. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… operators The TrustRegion scheme creates VecJacOperator and JacVecOperator directly from the problem, bypassing construct_jacobian_cache. When Enzyme is the AD backend, these operators need the unwrapped function (without FunctionWrappers) to avoid EnzymeMutabilityException. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of fixing individual call sites (trust_region.jl VecJac/JacVec, jacobian.jl construct_jacobian_cache), create _ad_prob with unwrapped function early in __init for both FirstOrder and QuasiNewton. This ensures ALL downstream AD consumers (Jacobian cache, trust region, linesearch, forcing) receive the unwrapped problem when Enzyme is used. - Add maybe_unwrap_prob_for_enzyme helper in NonlinearSolveBase - FirstOrder: create _ad_prob from alg.autodiff/jvp_autodiff/vjp_autodiff - QuasiNewton: detect Enzyme from kwargs and alg.linesearch/trustregion - Revert trust_region.jl inline fix (now handled upstream in solve.jl) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The generic fallback `initialization_alg(initprob, autodiff) = nothing` in
NonlinearSolveBase gets compiled during precompilation with inferred types
like `(::Any, ::AutoForwardDiff{nothing, Nothing})`. When downstream packages
(NonlinearSolve, NonlinearSolveFirstOrder) add more specific methods, they
invalidate all precompiled MethodInstances that inferred through the fallback,
causing ~218 cascading invalidations across QuasiNewton, FirstOrder, and
SpectralMethods solver caches.
Using `Base.invokelatest` at the callsite prevents the compiler from caching
the method dispatch, eliminating these invalidations entirely. The overhead
is negligible since `_run_initialization!` with `OverrideInit` is only called
once during initialization.
Results (Julia 1.10, @snoop_invalidations):
- Before: 543 unique invalidated MIs, 708 total nodes
- After: 347 unique invalidated MIs, 490 total nodes (36% reduction)
- initialization_alg trees: 218 nodes → 0 nodes (100% eliminated)
TTFX improvement (IIP NonlinearProblem + NewtonRaphson):
- Master: 3.05s
- AutoSpecialize branch: 1.19s
- AutoSpecialize + this fix: 0.93s (3.3x faster than master)
Test results:
- Core: 732 pass, 87 broken, 0 fail, 0 error
- Wrappers: 195 pass, 7 broken, 0 fail, 0 error
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
initialization_algdispatch during package loadingBase.invokelatestat the callsite in_run_initialization!to prevent the compiler from caching dispatch of the generic fallbackNote: This PR is based on the
autospecializebranch from #838. It should be merged after #838 lands, or can be squashed into #838.Root Cause
The generic fallback
initialization_alg(initprob, autodiff) = nothinginNonlinearSolveBasegets compiled during precompilation with inferred types like(::Any, ::AutoForwardDiff{nothing, Nothing}). When downstream packages add more specific methods:NonlinearSolveaddsinitialization_alg(::AbstractNonlinearProblem, autodiff)→ 104 invalidated nodesNonlinearSolveFirstOrderaddsinitialization_alg(::NonlinearLeastSquaresProblem, autodiff)→ 114 invalidated nodesThese invalidations cascade up to all precompiled
_run_initialization!instances across QuasiNewton, FirstOrder, and SpectralMethods solver caches.Methodology
Following the SciML compile time blog post:
@time_importsto identify slow package loads@snoop_inferenceto profile inference bottlenecks (biggest:ConstructionBase.tuple_or_ntupleat 0.21s,remakeat 0.17s)@snoop_invalidationsto identify the top invalidation sourcesinitialization_algas the Modify the benchmark and tests #1 and Refactor #2 biggest invalidation trees (218 combined nodes out of 708 total)Results
Invalidation reduction (Julia 1.10,
@snoop_invalidations)TTFX improvement (IIP NonlinearProblem + NewtonRaphson)
Remaining invalidators (all in external packages)
The remaining 490 invalidated nodes are all from external packages not fixable from NonlinearSolve:
convert(::Type{D}, d::D) where D<:Dual— 54 nodes_all/_any— 170 nodes (4 trees)all/any— 58 nodesfill,promote_rule,+— 53 nodesTest plan
Note on Enzyme.jl PR
This PR builds on the
autospecializebranch (#838). The Enzyme.jl companion PR (EnzymeAD/Enzyme.jl#2980) is still needed for removing the Enzyme-specific workaround code once merged.🤖 Generated with Claude Code