Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

0.4.1 - 2026-03-14

Fixed

Powi f32 exponent encoding: powi(n) on f32 bytecode tapes silently produced wrong values and gradients for negative exponents (n <= -2). The i32 exponent was stored as u32 then round-tripped through f32, which loses precision for values > 2^24 (all negative exponents). All 5 dispatch sites (forward, reverse, tangent forward, tangent reverse, cross-country) now decode the exponent directly from the raw u32 via powi_exp_decode_raw, bypassing the float conversion entirely.
taylor_powi negative base: Taylor::powi and bytecode Taylor-mode produced NaN for negative base values (e.g. (-2)^3) because the implementation used exp(n * ln(a)) which fails for ln(negative). Added taylor_powi_squaring using binary exponentiation with taylor_mul, dispatched when a[0] < 0 or |n| <= 8.
Checkpoint position lookup: grad_checkpointed, grad_checkpointed_disk, and grad_checkpointed_with_hints used Vec::contains() for checkpoint position lookups (O(n) per step). Converted to HashSet for O(1) lookups.
Nonsmooth Round kink detection: forward_nonsmooth now correctly detects Round kinks at half-integers (0.5, 1.5, ...) instead of at integers, matching the actual discontinuity locations of the round function. Updated test to match.

0.4.0 - 2026-02-26

Changed

Internal Architecture

BytecodeTape decomposition: split 2,689-line monolithic bytecode_tape.rs into a directory module with 10 focused submodules (forward.rs, reverse.rs, tangent.rs, jacobian.rs, sparse.rs, optimize.rs, taylor.rs, parallel.rs, serde_support.rs, thread_local.rs). Zero public API changes; benchmarks confirm no performance impact.
Deduplicated reverse sweep in gradient_with_buf() and sparse_jacobian_par() — both now call shared reverse_sweep_core() instead of inlining the loop. gradient_with_buf gains the zero-adjoint skip optimization it was previously missing.
Bumped nalgebra dependency from 0.33 to 0.34

Fixed

Corrected opcode variant count in documentation (44 variants, not 38/43)
Fixed CONTRIBUTING.md MSRV reference (1.93, not 1.80)

0.3.0 - 2026-02-25

Added

Differential Operator Evaluation (`diffop` feature)

diffop::mixed_partial(tape, x, orders) — compute any mixed partial derivative via jet coefficient extraction
diffop::hessian(tape, x) — full Hessian via jet extraction (cross-validated against tape.hessian())
MultiIndex — specify which mixed partial to compute (e.g., [2, 0, 1] = ∂³u/∂x₀²∂x₂)
JetPlan::plan(n, indices) — precompute slot assignments and extraction prefactors; reuse across evaluation points
diffop::eval_dyn(plan, tape, x) — evaluate a plan at a new point using TaylorDyn
Pushforward grouping: multi-indices with different active variable sets get separate forward passes to avoid slot contamination
Prime window sliding for collision-free slot assignment up to high derivative orders

0.2.0 - 2026-02-25

Added

Bytecode Tape (Graph-Mode AD)

BytecodeTape SoA graph-mode AD with opcode dispatch and tape optimization (CSE, DCE, constant folding)
BReverse<F> tape-recording reverse-mode variable
record() / record_multi() to build tapes from closures
Hessian computation via forward-over-reverse (hessian, hvp)
DualVec<F, N> batched forward-mode with N tangent lanes for vectorized Hessians (hessian_vec)

Sparse Derivatives

Sparsity pattern detection via bitset propagation
Graph coloring: greedy distance-2 for Jacobians, star bicoloring for Hessians
sparse_jacobian, sparse_hessian, sparse_hessian_vec
CSR storage (CsrPattern, JacobianSparsityPattern, SparsityPattern)

Taylor Mode AD

Taylor<F, K> const-generic Taylor coefficients with Cauchy product propagation
TaylorDyn<F> arena-based dynamic Taylor (runtime degree)
taylor_grad / taylor_grad_with_buf — reverse-over-Taylor for gradient + HVP + higher-order adjoints
ode_taylor_step / ode_taylor_step_with_buf — ODE Taylor series integration via coefficient bootstrapping

Stochastic Taylor Derivative Estimators (STDE)

laplacian — Hutchinson trace estimator for Laplacian approximation
hessian_diagonal — exact Hessian diagonal via coordinate basis
directional_derivatives — batched second-order directional derivatives
laplacian_with_stats — Welford's online variance tracking
laplacian_with_control — diagonal control variate variance reduction
Estimator trait generalizing per-direction sample computation (Laplacian, GradientSquaredNorm)
estimate / estimate_weighted generic pipeline
Hutchinson divergence estimator for vector fields via Dual<F> forward mode
Hutch++ (Meyer et al. 2021) O(1/S²) trace estimator via sketch + residual decomposition
Importance-weighted estimation (West's 1979 algorithm)

Cross-Country Elimination

jacobian_cross_country — Markowitz vertex elimination on linearized computational graph

Custom Operations

eval_dual / partials_dual default methods on CustomOp<F> for correct second-order derivatives (HVP, Hessian) through custom ops

Nonsmooth AD

forward_nonsmooth — branch tracking and kink detection for abs/min/max/signum/floor/ceil/round/trunc
clarke_jacobian — Clarke generalized Jacobian via limiting Jacobian enumeration
has_nontrivial_subdifferential() — two-tier classification: all 8 nonsmooth ops tracked for proximity detection; only abs/min/max enumerated in Clarke Jacobian
KinkEntry, NonsmoothInfo, ClarkeError types

Laurent Series

Laurent<F, K> — singularity analysis with pole tracking, flows through BytecodeTape::forward_tangent

Checkpointing

grad_checkpointed — binomial Revolve checkpointing
grad_checkpointed_online — periodic thinning for unknown step count
grad_checkpointed_disk — disk-backed for large state vectors
grad_checkpointed_with_hints — user-controlled checkpoint placement

GPU Acceleration

wgpu backend: batched forward, gradient, sparse Jacobian, HVP, sparse Hessian (f32, Metal/Vulkan/DX12)
CUDA backend: same operations with f32 + f64 support (NVRTC runtime compilation)
GpuBackend trait unifying wgpu and CUDA backends behind a common interface

Composable Mode Nesting

Type-level AD composition: Dual<BReverse<f64>>, Taylor<BReverse<f64>, K>, DualVec<BReverse<f64>, N>
composed_hvp convenience function for forward-over-reverse HVP
BReverse<Dual<f64>> reverse-wrapping-forward composition via BtapeThreadLocal impls for Dual<f32> and Dual<f64>

Serialization

serde support for BytecodeTape, Laurent<F, K>, KinkEntry, NonsmoothInfo, ClarkeError
JSON and bincode roundtrip support

Linear Algebra Integrations

faer_support: HVP, sparse Hessian, dense/sparse solvers (LU, Cholesky)
nalgebra_support: gradient, Hessian, Jacobian with nalgebra types
ndarray_support: HVP, sparse Hessian, sparse Jacobian with ndarray types

Optimization Solvers (`echidna-optim`)

L-BFGS solver with two-loop recursion
Newton solver with Cholesky factorization
Trust-region solver with Steihaug-Toint CG
Armijo line search
Implicit differentiation: implicit_tangent, implicit_adjoint, implicit_jacobian, implicit_hvp, implicit_hessian
Piggyback differentiation: tangent, adjoint, and interleaved forward-adjoint modes
Sparse implicit differentiation via faer sparse LU (sparse-implicit feature)

Benchmarking

Criterion benchmarks for Taylor mode, STDE, cross-country, sparse derivatives, nonsmooth
Comparison benchmarks against num-dual and ad-trait (forward + reverse gradient)
Correctness cross-check tests verifying ad-trait gradient agreement with echidna
CI regression detection via criterion-compare-action

Changed

Tape optimization: algebraic simplification at recording time (identity, absorbing, powi patterns)
Tape optimization: targeted multi-output DCE (dead_code_elimination_for_outputs)
Thread-local Adept tape pooling — grad()/vjp() reuse cleared tapes via thread-local pool instead of per-call allocation
Signed::signum() for BReverse<F> now records OpCode::Signum to tape (was returning a constant)
MSRV raised from 1.80 to 1.93
WelfordAccumulator struct extracted, deduplicating Welford's algorithm across 4 STDE functions
cuda_err helper extracted, replacing 72 inline .map_err closures in CUDA backend
create_tape_bind_group method extracted, replacing 4 duplicated bind group blocks in wgpu backend

0.1.0 - 2026-02-21

Added

Core Types

Dual<F> forward-mode dual number with all 30+ elemental operations
Reverse<F> reverse-mode AD variable (12 bytes for f64, Copy)
Float marker trait for f32/f64
Scalar trait for writing AD-generic code
Type aliases: Dual64, Dual32, Reverse64, Reverse32

Tape

Adept-style two-stack tape with precomputed partial derivatives
Thread-local active tape with RAII guard (TapeGuard)
Constant sentinel (u32::MAX) to avoid tape bloat from literals
Zero-adjoint skipping in the reverse sweep

API

grad(f, x) — gradient via reverse mode
jvp(f, x, v) — Jacobian-vector product via forward mode
vjp(f, x, w) — vector-Jacobian product via reverse mode
jacobian(f, x) — full Jacobian via forward mode

Elemental Operations

Powers: recip, sqrt, cbrt, powi, powf
Exp/Log: exp, exp2, exp_m1, ln, log2, log10, ln_1p, log
Trig: sin, cos, tan, sin_cos, asin, acos, atan, atan2
Hyperbolic: sinh, cosh, tanh, asinh, acosh, atanh
Misc: abs, signum, floor, ceil, round, trunc, fract, mul_add, hypot

Trait Implementations

num-traits: Float, Zero, One, Num, Signed, FloatConst, FromPrimitive, ToPrimitive, NumCast
std::ops: Add, Sub, Mul, Div, Neg, Rem with assign variants
Mixed scalar ops (Dual<f64> + f64, f64 * Reverse<f64>, etc.)

Testing

94 tests: forward mode, reverse mode, API, and cross-validation
Every elemental validated against central finite differences
Forward-vs-reverse cross-validation on Rosenbrock, Beale, Ackley, Booth, and more
Criterion benchmarks for forward overhead and reverse gradient

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Unreleased

0.4.1 - 2026-03-14

Fixed

0.4.0 - 2026-02-26

Changed

Internal Architecture

Fixed

0.3.0 - 2026-02-25

Added

Differential Operator Evaluation (`diffop` feature)

0.2.0 - 2026-02-25

Added

Bytecode Tape (Graph-Mode AD)

Sparse Derivatives

Taylor Mode AD

Stochastic Taylor Derivative Estimators (STDE)

Cross-Country Elimination

Custom Operations

Nonsmooth AD

Laurent Series

Checkpointing

GPU Acceleration

Composable Mode Nesting

Serialization

Linear Algebra Integrations

Optimization Solvers (`echidna-optim`)

Benchmarking

Changed

0.1.0 - 2026-02-21

Added

Core Types

Tape

API

Elemental Operations

Trait Implementations

Testing

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

0.4.1 - 2026-03-14

Fixed

0.4.0 - 2026-02-26

Changed

Internal Architecture

Fixed

0.3.0 - 2026-02-25

Added

Differential Operator Evaluation (diffop feature)

0.2.0 - 2026-02-25

Added

Bytecode Tape (Graph-Mode AD)

Sparse Derivatives

Taylor Mode AD

Stochastic Taylor Derivative Estimators (STDE)

Cross-Country Elimination

Custom Operations

Nonsmooth AD

Laurent Series

Checkpointing

GPU Acceleration

Composable Mode Nesting

Serialization

Linear Algebra Integrations

Optimization Solvers (echidna-optim)

Benchmarking

Changed

0.1.0 - 2026-02-21

Added

Core Types

Tape

API

Elemental Operations

Trait Implementations

Testing

Differential Operator Evaluation (`diffop` feature)

Optimization Solvers (`echidna-optim`)