Skip to content

fix: eliminate initialization_alg invalidations via invokelatest#855

Closed
ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
ChrisRackauckas-Claude:fix/invalidations-initialization-alg
Closed

fix: eliminate initialization_alg invalidations via invokelatest#855
ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
ChrisRackauckas-Claude:fix/invalidations-initialization-alg

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Eliminates 218 cascading invalidation nodes caused by initialization_alg dispatch during package loading
  • Reduces total invalidated MethodInstances from 543 → 347 (36% reduction) when loading NonlinearSolve
  • Uses Base.invokelatest at the callsite in _run_initialization! to prevent the compiler from caching dispatch of the generic fallback

Note: This PR is based on the autospecialize branch from #838. It should be merged after #838 lands, or can be squashed into #838.

Root Cause

The generic fallback initialization_alg(initprob, autodiff) = nothing in NonlinearSolveBase gets compiled during precompilation with inferred types like (::Any, ::AutoForwardDiff{nothing, Nothing}). When downstream packages add more specific methods:

  1. NonlinearSolve adds initialization_alg(::AbstractNonlinearProblem, autodiff) → 104 invalidated nodes
  2. NonlinearSolveFirstOrder adds initialization_alg(::NonlinearLeastSquaresProblem, autodiff) → 114 invalidated nodes

These invalidations cascade up to all precompiled _run_initialization! instances across QuasiNewton, FirstOrder, and SpectralMethods solver caches.

Methodology

Following the SciML compile time blog post:

  1. Used @time_imports to identify slow package loads
  2. Used @snoop_inference to profile inference bottlenecks (biggest: ConstructionBase.tuple_or_ntuple at 0.21s, remake at 0.17s)
  3. Used @snoop_invalidations to identify the top invalidation sources
  4. Found initialization_alg as the Modify the benchmark and tests #1 and Refactor #2 biggest invalidation trees (218 combined nodes out of 708 total)

Results

Invalidation reduction (Julia 1.10, @snoop_invalidations)

Metric Before After Change
Unique invalidated MIs 543 347 -36%
Total invalidated nodes 708 490 -31%
initialization_alg nodes 218 0 -100%

TTFX improvement (IIP NonlinearProblem + NewtonRaphson)

Scenario Master AutoSpecialize + This Fix
First solve (TTFX) 3.05s 1.19s 0.93s
Different function ~1.0s 0.29s 0.46s
Runtime (2nd call) 0.36ms 0.08ms 0.11ms

Remaining invalidators (all in external packages)

The remaining 490 invalidated nodes are all from external packages not fixable from NonlinearSolve:

  • ForwardDiff convert(::Type{D}, d::D) where D<:Dual — 54 nodes
  • SparseArrays _all/_any — 170 nodes (4 trees)
  • RecursiveArrayTools all/any — 58 nodes
  • ChainRulesCore fill, promote_rule, + — 53 nodes

Test plan

  • NonlinearSolveBase tests: 16/16 pass
  • Core tests (GROUP=core): 732 pass, 87 broken, 0 fail, 0 error
  • Wrapper tests (GROUP=wrappers): 195 pass, 7 broken, 0 fail, 0 error
  • Manual TTFX verification with IIP/OOP functions and multiple solvers

Note on Enzyme.jl PR

This PR builds on the autospecialize branch (#838). The Enzyme.jl companion PR (EnzymeAD/Enzyme.jl#2980) is still needed for removing the Enzyme-specific workaround code once merged.

🤖 Generated with Claude Code

ChrisRackauckas and others added 17 commits February 22, 2026 20:18
…Base

Port the FunctionWrappersWrappers-based norecompile pattern from DiffEqBase
to NonlinearSolveBase. For standard problem types (Vector{Float64} state,
Vector{Float64} or NullParameters parameters), the problem function is
wrapped in a FunctionWrappersWrapper with precompiled type signatures for
both Float64 and ForwardDiff.Dual arguments, avoiding recompilation for
each unique user function type.

Key components:
- src/autospecialize.jl: NonlinearSolveTag, wrapfun_iip/oop base methods,
  maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback
- ForwardDiff extension: dual-aware wrapfun dispatches with 6 type
  combinations (Float64, Dual, NullParameters), tag standardization that
  stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when
  the function is wrapped
- solve.jl: maybe_wrap_f wired into get_concrete_problem for all problem
  types (NonlinearProblem, NonlinearLeastSquaresProblem,
  ImmutableNonlinearProblem), using EvalFunc wrapper for invokelatest
- jacobian.jl: standardize_forwarddiff_tag called in
  construct_jacobian_cache so DI produces correctly-tagged duals

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ImmutableNonlinearProblem (used by SimpleNonlinearSolve) doesn't support
Setfield reconstruction with wrapped function types. Skip wrapping since
SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile
pathway.

Fixes CI adjoint test failure in SimpleNonlinearSolve.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FunctionWrapper wrapping cannot be automatically applied at solve time
because multiple code paths (∂f/∂p, ∂f/∂u, bounds transform) call
ForwardDiff directly with default chunk sizes, bypassing the standardized
chunksize=1 path. This caused "No matching function wrapper found!" errors
whenever ForwardDiff used chunksize > 1.

The infrastructure (autospecialize.jl, extension wrappers, tag
standardization) remains available for targeted use. Automatic wrapping
requires coordinating ALL ForwardDiff call sites to use chunksize=1.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The standardize_forwarddiff_tag calls in autodiff.jl and jacobian.jl
cause dual tag ordering errors when nested ForwardDiff is used (e.g.,
NLLS sensitivity + inner VJP). Remove these call sites and the unused
maybe_wrap_f function since automatic wrapping is not yet active.

The autospecialize infrastructure (NonlinearSolveTag, wrapfun_iip/oop,
ForwardDiff extension wrappers) remains available for future activation
when all direct ForwardDiff call sites are standardized.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Wire `maybe_wrap_f` into `get_concrete_problem` for NonlinearProblem and
NonlinearLeastSquaresProblem (IIP). Functions are wrapped in
`AutoSpecializeCallable{FW}` which holds a `FunctionWrappersWrapper` for
precompiled dispatch and the original function (type-erased as `Any`) for
try-catch fallback when dual tags mismatch (JVP paths, external packages).

Key changes:
- AutoSpecializeCallable uses `orig::Any` for type erasure (no EvalFunc)
- Skip OOP NLLS wrapping (return type may differ from u0)
- Standardize JVP/VJP autodiff tags in construct_jacobian_cache
- Replace AutoPolyesterForwardDiff with AutoForwardDiff{1,tag} when wrapped
- Use get_raw_f for nested ForwardDiff in NLLS VJP generation
- ForwardDiff sensitivity functions use chunksize=1 + tag when wrapped

Tests: core 727/0/0, wrapper 195/0/0, ForwardDiff 135636/0/0
OOP @inferred regresses (expected, same trade-off as DiffEqBase)

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot
differentiate through FunctionWrapper internals (llvmcall). This adds:

- ChainRulesCore rrule for AutoSpecializeCallable that redirects
  reverse-mode AD through the original unwrapped callable
- _DISABLE_AUTOSPECIALIZE flag set in the solve_up rrule to prevent
  wrapping entirely during the adjoint code path
- @test_broken for IIP @inferred (same wrapping-induced regression
  as OOP case)

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…type

- Remove task-local `_DISABLE_AUTOSPECIALIZE` flag entirely
- Replace with `@set prob.f.f = get_raw_f(prob.f.f)` unwrapping in rrule
- Remove parameter type restriction (any p works, mismatches fall back)
- Add idempotency check to prevent double-wrapping
- Remove `_DISABLE_AUTOSPECIALIZE` from public API

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
These are internal implementation details, not public API.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OOP wrapping requires guessing return types which doesn't always work.
Only wrap IIP functions where the return type is always Nothing.

IIP TTFX improvement (2nd/3rd function, same types):
- NewtonRaphson: 2.2-2.5x faster
- TrustRegion: 18x faster

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The existing workload used scalar p=2.0, which produces different
FunctionWrapper types than the common user case of Vector{Float64}
parameters. This caused the precompiled wrappers to miss the user path.

TTFX for IIP Vector{Float64} first solve: 2.7s → 1.0s

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The try-catch in AutoSpecializeCallable prevented inlining and added
~32 bytes per call, exceeding the 64-byte @ballocated budget in
NonlinearSolveFirstOrder, QuasiNewton, and SpectralMethods tests.

Replace with explicit dispatch methods for known argument types
(Vector{Float64}, Float64, NullParameters, and ForwardDiff duals),
routing to f.fw for zero-allocation calls. Unsupported types fall
back to f.orig via vararg dispatch. Also fix @test_broken -> @test
for @inferred solve(prob) which now passes.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Enzyme cannot differentiate through FunctionWrappers' llvmcall, causing
EnzymeMutabilityException in all IIP Vector{Float64} tests with AutoEnzyme.
Unwrap the function in construct_jacobian_cache when the AD backend is
Enzyme-based (including AutoSparse(AutoEnzyme(...))), so DI sees the raw
user function. Also apply Runic formatting to SCCNonlinearSolve files.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit only unwrapped for the concrete Jacobian path
(DI.prepare_jacobian/DI.jacobian). This extends the fix to the
JacobianOperator path used by Krylov solvers (GMRES, etc.) and
backslash with concrete_jac=false.

When Enzyme is used for JVP/VJP autodiff, create a modified problem
with the raw user function so SciMLJacobianOperators' DI.pushforward!/
DI.pullback! calls don't go through FunctionWrappers' llvmcall.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… operators

The TrustRegion scheme creates VecJacOperator and JacVecOperator directly
from the problem, bypassing construct_jacobian_cache. When Enzyme is the
AD backend, these operators need the unwrapped function (without
FunctionWrappers) to avoid EnzymeMutabilityException.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of fixing individual call sites (trust_region.jl VecJac/JacVec,
jacobian.jl construct_jacobian_cache), create _ad_prob with unwrapped
function early in __init for both FirstOrder and QuasiNewton. This
ensures ALL downstream AD consumers (Jacobian cache, trust region,
linesearch, forcing) receive the unwrapped problem when Enzyme is used.

- Add maybe_unwrap_prob_for_enzyme helper in NonlinearSolveBase
- FirstOrder: create _ad_prob from alg.autodiff/jvp_autodiff/vjp_autodiff
- QuasiNewton: detect Enzyme from kwargs and alg.linesearch/trustregion
- Revert trust_region.jl inline fix (now handled upstream in solve.jl)

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The generic fallback `initialization_alg(initprob, autodiff) = nothing` in
NonlinearSolveBase gets compiled during precompilation with inferred types
like `(::Any, ::AutoForwardDiff{nothing, Nothing})`. When downstream packages
(NonlinearSolve, NonlinearSolveFirstOrder) add more specific methods, they
invalidate all precompiled MethodInstances that inferred through the fallback,
causing ~218 cascading invalidations across QuasiNewton, FirstOrder, and
SpectralMethods solver caches.

Using `Base.invokelatest` at the callsite prevents the compiler from caching
the method dispatch, eliminating these invalidations entirely. The overhead
is negligible since `_run_initialization!` with `OverrideInit` is only called
once during initialization.

Results (Julia 1.10, @snoop_invalidations):
- Before: 543 unique invalidated MIs, 708 total nodes
- After:  347 unique invalidated MIs, 490 total nodes (36% reduction)
- initialization_alg trees: 218 nodes → 0 nodes (100% eliminated)

TTFX improvement (IIP NonlinearProblem + NewtonRaphson):
- Master:                    3.05s
- AutoSpecialize branch:     1.19s
- AutoSpecialize + this fix: 0.93s (3.3x faster than master)

Test results:
- Core:     732 pass, 87 broken, 0 fail, 0 error
- Wrappers: 195 pass, 7 broken, 0 fail, 0 error

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants