All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.4.1 - 2026-03-14
- Powi f32 exponent encoding:
powi(n)on f32 bytecode tapes silently produced wrong values and gradients for negative exponents (n <= -2). Thei32exponent was stored asu32then round-tripped throughf32, which loses precision for values > 2^24 (all negative exponents). All 5 dispatch sites (forward, reverse, tangent forward, tangent reverse, cross-country) now decode the exponent directly from the rawu32viapowi_exp_decode_raw, bypassing the float conversion entirely. - taylor_powi negative base:
Taylor::powiand bytecode Taylor-mode produced NaN for negative base values (e.g.(-2)^3) because the implementation usedexp(n * ln(a))which fails forln(negative). Addedtaylor_powi_squaringusing binary exponentiation withtaylor_mul, dispatched whena[0] < 0or|n| <= 8. - Checkpoint position lookup:
grad_checkpointed,grad_checkpointed_disk, andgrad_checkpointed_with_hintsusedVec::contains()for checkpoint position lookups (O(n) per step). Converted toHashSetfor O(1) lookups. - Nonsmooth Round kink detection:
forward_nonsmoothnow correctly detects Round kinks at half-integers (0.5, 1.5, ...) instead of at integers, matching the actual discontinuity locations of theroundfunction. Updated test to match.
0.4.0 - 2026-02-26
- BytecodeTape decomposition: split 2,689-line monolithic
bytecode_tape.rsinto a directory module with 10 focused submodules (forward.rs,reverse.rs,tangent.rs,jacobian.rs,sparse.rs,optimize.rs,taylor.rs,parallel.rs,serde_support.rs,thread_local.rs). Zero public API changes; benchmarks confirm no performance impact. - Deduplicated reverse sweep in
gradient_with_buf()andsparse_jacobian_par()— both now call sharedreverse_sweep_core()instead of inlining the loop.gradient_with_bufgains the zero-adjoint skip optimization it was previously missing. - Bumped
nalgebradependency from 0.33 to 0.34
- Corrected opcode variant count in documentation (44 variants, not 38/43)
- Fixed CONTRIBUTING.md MSRV reference (1.93, not 1.80)
0.3.0 - 2026-02-25
diffop::mixed_partial(tape, x, orders)— compute any mixed partial derivative via jet coefficient extractiondiffop::hessian(tape, x)— full Hessian via jet extraction (cross-validated againsttape.hessian())MultiIndex— specify which mixed partial to compute (e.g.,[2, 0, 1]= ∂³u/∂x₀²∂x₂)JetPlan::plan(n, indices)— precompute slot assignments and extraction prefactors; reuse across evaluation pointsdiffop::eval_dyn(plan, tape, x)— evaluate a plan at a new point usingTaylorDyn- Pushforward grouping: multi-indices with different active variable sets get separate forward passes to avoid slot contamination
- Prime window sliding for collision-free slot assignment up to high derivative orders
0.2.0 - 2026-02-25
BytecodeTapeSoA graph-mode AD with opcode dispatch and tape optimization (CSE, DCE, constant folding)BReverse<F>tape-recording reverse-mode variablerecord()/record_multi()to build tapes from closures- Hessian computation via forward-over-reverse (
hessian,hvp) DualVec<F, N>batched forward-mode with N tangent lanes for vectorized Hessians (hessian_vec)
- Sparsity pattern detection via bitset propagation
- Graph coloring: greedy distance-2 for Jacobians, star bicoloring for Hessians
sparse_jacobian,sparse_hessian,sparse_hessian_vec- CSR storage (
CsrPattern,JacobianSparsityPattern,SparsityPattern)
Taylor<F, K>const-generic Taylor coefficients with Cauchy product propagationTaylorDyn<F>arena-based dynamic Taylor (runtime degree)taylor_grad/taylor_grad_with_buf— reverse-over-Taylor for gradient + HVP + higher-order adjointsode_taylor_step/ode_taylor_step_with_buf— ODE Taylor series integration via coefficient bootstrapping
laplacian— Hutchinson trace estimator for Laplacian approximationhessian_diagonal— exact Hessian diagonal via coordinate basisdirectional_derivatives— batched second-order directional derivativeslaplacian_with_stats— Welford's online variance trackinglaplacian_with_control— diagonal control variate variance reductionEstimatortrait generalizing per-direction sample computation (Laplacian,GradientSquaredNorm)estimate/estimate_weightedgeneric pipeline- Hutchinson divergence estimator for vector fields via
Dual<F>forward mode - Hutch++ (Meyer et al. 2021) O(1/S²) trace estimator via sketch + residual decomposition
- Importance-weighted estimation (West's 1979 algorithm)
jacobian_cross_country— Markowitz vertex elimination on linearized computational graph
eval_dual/partials_dualdefault methods onCustomOp<F>for correct second-order derivatives (HVP, Hessian) through custom ops
forward_nonsmooth— branch tracking and kink detection for abs/min/max/signum/floor/ceil/round/truncclarke_jacobian— Clarke generalized Jacobian via limiting Jacobian enumerationhas_nontrivial_subdifferential()— two-tier classification: all 8 nonsmooth ops tracked for proximity detection; only abs/min/max enumerated in Clarke JacobianKinkEntry,NonsmoothInfo,ClarkeErrortypes
Laurent<F, K>— singularity analysis with pole tracking, flows throughBytecodeTape::forward_tangent
grad_checkpointed— binomial Revolve checkpointinggrad_checkpointed_online— periodic thinning for unknown step countgrad_checkpointed_disk— disk-backed for large state vectorsgrad_checkpointed_with_hints— user-controlled checkpoint placement
- wgpu backend: batched forward, gradient, sparse Jacobian, HVP, sparse Hessian (f32, Metal/Vulkan/DX12)
- CUDA backend: same operations with f32 + f64 support (NVRTC runtime compilation)
GpuBackendtrait unifying wgpu and CUDA backends behind a common interface
- Type-level AD composition:
Dual<BReverse<f64>>,Taylor<BReverse<f64>, K>,DualVec<BReverse<f64>, N> composed_hvpconvenience function for forward-over-reverse HVPBReverse<Dual<f64>>reverse-wrapping-forward composition viaBtapeThreadLocalimpls forDual<f32>andDual<f64>
serdesupport forBytecodeTape,Laurent<F, K>,KinkEntry,NonsmoothInfo,ClarkeError- JSON and bincode roundtrip support
faer_support: HVP, sparse Hessian, dense/sparse solvers (LU, Cholesky)nalgebra_support: gradient, Hessian, Jacobian with nalgebra typesndarray_support: HVP, sparse Hessian, sparse Jacobian with ndarray types
- L-BFGS solver with two-loop recursion
- Newton solver with Cholesky factorization
- Trust-region solver with Steihaug-Toint CG
- Armijo line search
- Implicit differentiation:
implicit_tangent,implicit_adjoint,implicit_jacobian,implicit_hvp,implicit_hessian - Piggyback differentiation: tangent, adjoint, and interleaved forward-adjoint modes
- Sparse implicit differentiation via faer sparse LU (
sparse-implicitfeature)
- Criterion benchmarks for Taylor mode, STDE, cross-country, sparse derivatives, nonsmooth
- Comparison benchmarks against num-dual and ad-trait (forward + reverse gradient)
- Correctness cross-check tests verifying ad-trait gradient agreement with echidna
- CI regression detection via criterion-compare-action
- Tape optimization: algebraic simplification at recording time (identity, absorbing, powi patterns)
- Tape optimization: targeted multi-output DCE (
dead_code_elimination_for_outputs) - Thread-local Adept tape pooling —
grad()/vjp()reuse cleared tapes via thread-local pool instead of per-call allocation Signed::signum()forBReverse<F>now recordsOpCode::Signumto tape (was returning a constant)- MSRV raised from 1.80 to 1.93
WelfordAccumulatorstruct extracted, deduplicating Welford's algorithm across 4 STDE functionscuda_errhelper extracted, replacing 72 inline.map_errclosures in CUDA backendcreate_tape_bind_groupmethod extracted, replacing 4 duplicated bind group blocks in wgpu backend
0.1.0 - 2026-02-21
Dual<F>forward-mode dual number with all 30+ elemental operationsReverse<F>reverse-mode AD variable (12 bytes for f64,Copy)Floatmarker trait forf32/f64Scalartrait for writing AD-generic code- Type aliases:
Dual64,Dual32,Reverse64,Reverse32
- Adept-style two-stack tape with precomputed partial derivatives
- Thread-local active tape with RAII guard (
TapeGuard) - Constant sentinel (
u32::MAX) to avoid tape bloat from literals - Zero-adjoint skipping in the reverse sweep
grad(f, x)— gradient via reverse modejvp(f, x, v)— Jacobian-vector product via forward modevjp(f, x, w)— vector-Jacobian product via reverse modejacobian(f, x)— full Jacobian via forward mode
- Powers:
recip,sqrt,cbrt,powi,powf - Exp/Log:
exp,exp2,exp_m1,ln,log2,log10,ln_1p,log - Trig:
sin,cos,tan,sin_cos,asin,acos,atan,atan2 - Hyperbolic:
sinh,cosh,tanh,asinh,acosh,atanh - Misc:
abs,signum,floor,ceil,round,trunc,fract,mul_add,hypot
num-traits:Float,Zero,One,Num,Signed,FloatConst,FromPrimitive,ToPrimitive,NumCaststd::ops:Add,Sub,Mul,Div,Neg,Remwith assign variants- Mixed scalar ops (
Dual<f64> + f64,f64 * Reverse<f64>, etc.)
- 94 tests: forward mode, reverse mode, API, and cross-validation
- Every elemental validated against central finite differences
- Forward-vs-reverse cross-validation on Rosenbrock, Beale, Ackley, Booth, and more
- Criterion benchmarks for forward overhead and reverse gradient