fix: eliminate initialization_alg invalidations via invokelatest by ChrisRackauckas-Claude · Pull Request #855 · SciML/NonlinearSolve.jl

ChrisRackauckas-Claude · 2026-02-26T16:15:08Z

Summary

Eliminates 218 cascading invalidation nodes caused by initialization_alg dispatch during package loading
Reduces total invalidated MethodInstances from 543 → 347 (36% reduction) when loading NonlinearSolve
Uses Base.invokelatest at the callsite in _run_initialization! to prevent the compiler from caching dispatch of the generic fallback

Note: This PR is based on the autospecialize branch from #838. It should be merged after #838 lands, or can be squashed into #838.

Root Cause

The generic fallback initialization_alg(initprob, autodiff) = nothing in NonlinearSolveBase gets compiled during precompilation with inferred types like (::Any, ::AutoForwardDiff{nothing, Nothing}). When downstream packages add more specific methods:

NonlinearSolve adds initialization_alg(::AbstractNonlinearProblem, autodiff) → 104 invalidated nodes
NonlinearSolveFirstOrder adds initialization_alg(::NonlinearLeastSquaresProblem, autodiff) → 114 invalidated nodes

These invalidations cascade up to all precompiled _run_initialization! instances across QuasiNewton, FirstOrder, and SpectralMethods solver caches.

Methodology

Following the SciML compile time blog post:

Used @time_imports to identify slow package loads
Used @snoop_inference to profile inference bottlenecks (biggest: ConstructionBase.tuple_or_ntuple at 0.21s, remake at 0.17s)
Used @snoop_invalidations to identify the top invalidation sources
Found initialization_alg as the Modify the benchmark and tests #1 and Refactor #2 biggest invalidation trees (218 combined nodes out of 708 total)

Results

Invalidation reduction (Julia 1.10, `@snoop_invalidations`)

Metric	Before	After	Change
Unique invalidated MIs	543	347	-36%
Total invalidated nodes	708	490	-31%
initialization_alg nodes	218	0	-100%

TTFX improvement (IIP NonlinearProblem + NewtonRaphson)

Scenario	Master	AutoSpecialize	+ This Fix
First solve (TTFX)	3.05s	1.19s	0.93s
Different function	~1.0s	0.29s	0.46s
Runtime (2nd call)	0.36ms	0.08ms	0.11ms

Remaining invalidators (all in external packages)

The remaining 490 invalidated nodes are all from external packages not fixable from NonlinearSolve:

ForwardDiff convert(::Type{D}, d::D) where D<:Dual — 54 nodes
SparseArrays _all/_any — 170 nodes (4 trees)
RecursiveArrayTools all/any — 58 nodes
ChainRulesCore fill, promote_rule, + — 53 nodes

Test plan

NonlinearSolveBase tests: 16/16 pass
Core tests (GROUP=core): 732 pass, 87 broken, 0 fail, 0 error
Wrapper tests (GROUP=wrappers): 195 pass, 7 broken, 0 fail, 0 error
Manual TTFX verification with IIP/OOP functions and multiple solvers

Note on Enzyme.jl PR

This PR builds on the autospecialize branch (#838). The Enzyme.jl companion PR (EnzymeAD/Enzyme.jl#2980) is still needed for removing the Enzyme-specific workaround code once merged.

🤖 Generated with Claude Code

…Base Port the FunctionWrappersWrappers-based norecompile pattern from DiffEqBase to NonlinearSolveBase. For standard problem types (Vector{Float64} state, Vector{Float64} or NullParameters parameters), the problem function is wrapped in a FunctionWrappersWrapper with precompiled type signatures for both Float64 and ForwardDiff.Dual arguments, avoiding recompilation for each unique user function type. Key components: - src/autospecialize.jl: NonlinearSolveTag, wrapfun_iip/oop base methods, maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback - ForwardDiff extension: dual-aware wrapfun dispatches with 6 type combinations (Float64, Dual, NullParameters), tag standardization that stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when the function is wrapped - solve.jl: maybe_wrap_f wired into get_concrete_problem for all problem types (NonlinearProblem, NonlinearLeastSquaresProblem, ImmutableNonlinearProblem), using EvalFunc wrapper for invokelatest - jacobian.jl: standardize_forwarddiff_tag called in construct_jacobian_cache so DI produces correctly-tagged duals Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ImmutableNonlinearProblem (used by SimpleNonlinearSolve) doesn't support Setfield reconstruction with wrapped function types. Skip wrapping since SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile pathway. Fixes CI adjoint test failure in SimpleNonlinearSolve. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The FunctionWrapper wrapping cannot be automatically applied at solve time because multiple code paths (∂f/∂p, ∂f/∂u, bounds transform) call ForwardDiff directly with default chunk sizes, bypassing the standardized chunksize=1 path. This caused "No matching function wrapper found!" errors whenever ForwardDiff used chunksize > 1. The infrastructure (autospecialize.jl, extension wrappers, tag standardization) remains available for targeted use. Automatic wrapping requires coordinating ALL ForwardDiff call sites to use chunksize=1. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The standardize_forwarddiff_tag calls in autodiff.jl and jacobian.jl cause dual tag ordering errors when nested ForwardDiff is used (e.g., NLLS sensitivity + inner VJP). Remove these call sites and the unused maybe_wrap_f function since automatic wrapping is not yet active. The autospecialize infrastructure (NonlinearSolveTag, wrapfun_iip/oop, ForwardDiff extension wrappers) remains available for future activation when all direct ForwardDiff call sites are standardized. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Wire `maybe_wrap_f` into `get_concrete_problem` for NonlinearProblem and NonlinearLeastSquaresProblem (IIP). Functions are wrapped in `AutoSpecializeCallable{FW}` which holds a `FunctionWrappersWrapper` for precompiled dispatch and the original function (type-erased as `Any`) for try-catch fallback when dual tags mismatch (JVP paths, external packages). Key changes: - AutoSpecializeCallable uses `orig::Any` for type erasure (no EvalFunc) - Skip OOP NLLS wrapping (return type may differ from u0) - Standardize JVP/VJP autodiff tags in construct_jacobian_cache - Replace AutoPolyesterForwardDiff with AutoForwardDiff{1,tag} when wrapped - Use get_raw_f for nested ForwardDiff in NLLS VJP generation - ForwardDiff sensitivity functions use chunksize=1 + tag when wrapped Tests: core 727/0/0, wrapper 195/0/0, ForwardDiff 135636/0/0 OOP @inferred regresses (expected, same trade-off as DiffEqBase) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot differentiate through FunctionWrapper internals (llvmcall). This adds: - ChainRulesCore rrule for AutoSpecializeCallable that redirects reverse-mode AD through the original unwrapped callable - _DISABLE_AUTOSPECIALIZE flag set in the solve_up rrule to prevent wrapping entirely during the adjoint code path - @test_broken for IIP @inferred (same wrapping-induced regression as OOP case) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…type - Remove task-local `_DISABLE_AUTOSPECIALIZE` flag entirely - Replace with `@set prob.f.f = get_raw_f(prob.f.f)` unwrapping in rrule - Remove parameter type restriction (any p works, mismatches fall back) - Add idempotency check to prevent double-wrapping - Remove `_DISABLE_AUTOSPECIALIZE` from public API Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

These are internal implementation details, not public API. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

OOP wrapping requires guessing return types which doesn't always work. Only wrap IIP functions where the return type is always Nothing. IIP TTFX improvement (2nd/3rd function, same types): - NewtonRaphson: 2.2-2.5x faster - TrustRegion: 18x faster Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The existing workload used scalar p=2.0, which produces different FunctionWrapper types than the common user case of Vector{Float64} parameters. This caused the precompiled wrappers to miss the user path. TTFX for IIP Vector{Float64} first solve: 2.7s → 1.0s Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

@test

The try-catch in AutoSpecializeCallable prevented inlining and added ~32 bytes per call, exceeding the 64-byte @ballocated budget in NonlinearSolveFirstOrder, QuasiNewton, and SpectralMethods tests. Replace with explicit dispatch methods for known argument types (Vector{Float64}, Float64, NullParameters, and ForwardDiff duals), routing to f.fw for zero-allocation calls. Unsupported types fall back to f.orig via vararg dispatch. Also fix @test_broken -> @test for @inferred solve(prob) which now passes. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Enzyme cannot differentiate through FunctionWrappers' llvmcall, causing EnzymeMutabilityException in all IIP Vector{Float64} tests with AutoEnzyme. Unwrap the function in construct_jacobian_cache when the AD backend is Enzyme-based (including AutoSparse(AutoEnzyme(...))), so DI sees the raw user function. Also apply Runic formatting to SCCNonlinearSolve files. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous commit only unwrapped for the concrete Jacobian path (DI.prepare_jacobian/DI.jacobian). This extends the fix to the JacobianOperator path used by Krylov solvers (GMRES, etc.) and backslash with concrete_jac=false. When Enzyme is used for JVP/VJP autodiff, create a modified problem with the raw user function so SciMLJacobianOperators' DI.pushforward!/ DI.pullback! calls don't go through FunctionWrappers' llvmcall. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… operators The TrustRegion scheme creates VecJacOperator and JacVecOperator directly from the problem, bypassing construct_jacobian_cache. When Enzyme is the AD backend, these operators need the unwrapped function (without FunctionWrappers) to avoid EnzymeMutabilityException. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of fixing individual call sites (trust_region.jl VecJac/JacVec, jacobian.jl construct_jacobian_cache), create _ad_prob with unwrapped function early in __init for both FirstOrder and QuasiNewton. This ensures ALL downstream AD consumers (Jacobian cache, trust region, linesearch, forcing) receive the unwrapped problem when Enzyme is used. - Add maybe_unwrap_prob_for_enzyme helper in NonlinearSolveBase - FirstOrder: create _ad_prob from alg.autodiff/jvp_autodiff/vjp_autodiff - QuasiNewton: detect Enzyme from kwargs and alg.linesearch/trustregion - Revert trust_region.jl inline fix (now handled upstream in solve.jl) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The generic fallback `initialization_alg(initprob, autodiff) = nothing` in NonlinearSolveBase gets compiled during precompilation with inferred types like `(::Any, ::AutoForwardDiff{nothing, Nothing})`. When downstream packages (NonlinearSolve, NonlinearSolveFirstOrder) add more specific methods, they invalidate all precompiled MethodInstances that inferred through the fallback, causing ~218 cascading invalidations across QuasiNewton, FirstOrder, and SpectralMethods solver caches. Using `Base.invokelatest` at the callsite prevents the compiler from caching the method dispatch, eliminating these invalidations entirely. The overhead is negligible since `_run_initialization!` with `OverrideInit` is only called once during initialization. Results (Julia 1.10, @snoop_invalidations): - Before: 543 unique invalidated MIs, 708 total nodes - After: 347 unique invalidated MIs, 490 total nodes (36% reduction) - initialization_alg trees: 218 nodes → 0 nodes (100% eliminated) TTFX improvement (IIP NonlinearProblem + NewtonRaphson): - Master: 3.05s - AutoSpecialize branch: 1.19s - AutoSpecialize + this fix: 0.93s (3.3x faster than master) Test results: - Core: 732 pass, 87 broken, 0 fail, 0 error - Wrappers: 195 pass, 7 broken, 0 fail, 0 error Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisRackauckas and others added 17 commits February 22, 2026 20:18

Remove AutoSpecialize internals from public API

1bad60d

These are internal implementation details, not public API. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Fix Runic formatting in bounds_transform.jl

b6917fb

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ChrisRackauckas closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: eliminate initialization_alg invalidations via invokelatest#855

fix: eliminate initialization_alg invalidations via invokelatest#855
ChrisRackauckas-Claude wants to merge 17 commits intoSciML:masterfrom
ChrisRackauckas-Claude:fix/invalidations-initialization-alg

ChrisRackauckas-Claude commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented Feb 26, 2026

Summary

Root Cause

Methodology

Results

Invalidation reduction (Julia 1.10, @snoop_invalidations)

TTFX improvement (IIP NonlinearProblem + NewtonRaphson)

Remaining invalidators (all in external packages)

Test plan

Note on Enzyme.jl PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Invalidation reduction (Julia 1.10, `@snoop_invalidations`)