feat: support dynamic mode decomposition calibrator by Archerkattri · Pull Request #1053 · vipshop/cache-dit

Archerkattri · 2026-06-13T14:41:28Z

Add a Dynamic Mode Decomposition (Prony) exponential-basis calibrator (`calibrator_type="dmd"`)

Motivation

cache-dit's calibrators currently forecast cached hidden states / residuals with the
TaylorSeer polynomial expansion. This PR adds a second, drop-in calibrator backend
with an exponential forecast basis: Dynamic Mode Decomposition (Schmid 2010), the
SVD-regularised multivariate generalisation of Prony's method (1795). (To avoid the
common collision: this is not Distribution Matching Distillation.)

Honest, family-conditional pitch. We benchmarked both bases across two diffusion
families, and no single basis wins:

On flow-matching 3D generators the exponential basis wins clearly and the lead
grows with the cache interval (numbers below). This is the regime this calibrator is
for.
On DiT-class denoising (DiT-XL/2 ImageNet-256, 250-step DDPM) the ranking inverts:
the sign-correct TaylorSeer polynomial is near-lossless (paired-noise FID drift 2.27 vs
the uncached baseline at 3.81x), while the exponential basis drifts 1.7-1.9x more than
even a near-reuse Hermite control at every interval tested. We therefore do not
claim DMD as a better default; it is an additional basis for the workloads where it
wins, default behavior unchanged.

The mechanism behind the 3D win: across denoising steps each cached feature stream
evolves under a slowly varying, near-linear operator; the exact solution class of a
linear feature-ODE is a sum of damped/oscillatory exponentials, and the exponential basis
is exact on that class where any polynomial diverges under extrapolation. Whether a given
model family's stream is in that class at the served horizons is empirical, hence the
per-family numbers below.

It plugs into the existing CalibratorConfig pattern, exactly like
TaylorSeerCalibratorConfig:

import cache_dit
from cache_dit import DMDCalibratorConfig

cache_dit.enable_cache(
    pipe,
    calibrator_config=DMDCalibratorConfig(dmd_history=6),
)

The reference implementation
(hicache-plus-plus) also ships a
training-free holdout selector (backend="auto") that backcasts a held-out snapshot with
both bases per compute window and serves the winner. We benchmarked it on this exact
split and report the honest verdict: it solves intra-run regime switches, but it does
not recover the family-level winner on DiT (both holdout modes served the exponential
arm there, FID drift 18.11 vs the corrected polynomial's 3.54), so the recommended way
to consume this calibrator is a per-family default (DMD for flow-matching
generators; TaylorSeer for DiT-class denoising), not a selector. This PR keeps the
surface minimal: one new basis.

What the calibrator does (math summary)

At each full-compute step the calibrator records the computed tensor as a snapshot
(per named stream, like the TaylorSeer states). At an approximation step it:

takes the longest uniformly spaced suffix of the snapshot history (the identified
propagator advances exactly one snapshot-spacing per application, and DBCache's dynamic
decisions can make the compute cadence non-uniform; mixed spacings would corrupt the
fit);
identifies the linear propagator A with Y_{t+1} ~ A Y_t via one economy SVD of the
[d, n] snapshot matrix (n = history <= 6, so this is cheap relative to a forward
pass) with spectrum-based rank truncation (this is what rejects noise);
eigendecomposes once per compute window (cached; refit only when a new snapshot
arrives) and forecasts the (fractional) horizon k by eigenvalue powers:
Y_{t+k} ~ Phi (lambda^k * b), b = pinv(Phi) Y_t.

Below the 4-snapshot identifiability floor (a real-valued trajectory spends two real
degrees of freedom per complex pole, so one oscillatory mode already needs three snapshot
pairs), or whenever the fit is degenerate/non-finite, it transparently falls back to the
TaylorSeer expansion it also maintains; warm-up behaves exactly like the existing
calibrator.

Changes

caching/cache_contexts/calibrators/dmd.py: new DMDCalibrator + DMDState,
mirroring the TaylorSeerCalibrator / TaylorSeerState API (mark_step_begin,
update, approximate, step, reset_cache; per-stream states keyed by name).
caching/cache_contexts/calibrators/__init__.py: new DMDCalibratorConfig
dataclass (dmd_history, dmd_rank, dmd_ridge), registered in the Calibrator
factory and _supported_calibrators.
Export chain: DMDCalibratorConfig re-exported from cache_contexts, caching, and
the top-level cache_dit namespace, alongside TaylorSeerCalibratorConfig.

No new dependencies (torch-only), no behavior change unless calibrator_type="dmd" is
selected.

Validation so far

Unit-level: on synthetic trajectories from the exponential solution class, the
calibrator's post-warm-up forecast error is ~5e-8 relative L2 where the order-1 Taylor
expansion sits at ~0.4-1.9 (same snapshots, same schedule).
Method-level (reference implementation,
hicache-plus-plus), flow-matching
3D generators: on Hunyuan3D-2.1 (Toys4K, F-score@0.05 vs uncached baseline 0.911) the
deployed polynomial arm decays 0.88 / 0.74 / 0.38 at cache interval 3 / 5 / 6 while the
exponential basis holds 0.85 / 0.86 / 0.62; exactly lossless at interval 5 on
Hunyuan3D-2-mini; on SAM3D geometry-lossless (F1 = 1.000) through interval 6 at 1.56x.
DiT-class denoising, reported for honesty (the regime where you should NOT pick this
calibrator). DiT-XL/2 ImageNet-256, 250-step DDPM, cfg 1.5, paired-noise FID-10k drift
vs the uncached baseline (lossless cache reads ~0; full ledger and protocol:
hicache-plus-plus/benchmarks/dit_imagenet/RESULTS_DIT.md):

basis i4 i6 i8

TaylorSeer (corrected, +k) 2.27 (3.81x) - -

Hermite (corrected, +k) 3.54 (3.79x) 6.46 (5.46x) 10.74 (7.21x)

exponential (DMD) 18.02 54.24 100.65

Holdout selection does not rescue DiT either: in our pre-registered A/B both holdout
modes of the reference selector served the exponential arm (drift 18.11), because the
richer exponential fit backcasts the snapshot history better even where it
extrapolates forward worse. Hence the per-family default recommendation above.
Remaining before marking ready for review: a FLUX.1-dev A/B with this exact
calibrator.

Scoping summary for reviewers: this adds an opt-in basis that wins on flow-matching
generators and is reported, with numbers, as losing on DiT-class denoising. The
per-interval tables are included so the trade-off is judged directly, not from a single
operating point.

… (`calibrator_type="dmd"`)

DefTruth · 2026-06-13T14:51:10Z

@Archerkattri Hi, thanks for your contribution! Can you show some visualize cases w/ or w/o dmd calibrator?

…verflow - Cache the horizon-free DMD eigendecomposition per snapshot window (DMDState._fit / _fit_key, invalidated when a new snapshot arrives). Skip steps now reuse one SVD/eig instead of recomputing it every step, which is what restores the intended cache speedup at large fresh intervals. - Fit DMD independently per batch item (axis 0). Flattening folded the batch into one state, so a prompt's forecast depended on the other prompts in the batch; per-item fitting keeps them independent like the Taylor path. - Move the finite check after the output-dtype cast: a finite float64 forecast can still overflow to inf in fp16, so the cast result is what gets guarded. - yapf / docformatter clean (fixes the failing pre-commit CI check).

Archerkattri · 2026-06-13T17:12:20Z

@DefTruth Thanks! e1067af has the visual cases you asked for plus the review fixes.

With / without the DMD calibrator (FLUX.1-dev, 50 steps, seed 42):

DMD vs the existing TaylorSeer calibrator (same DBCache, matched ~3.2x), which is the real case for the new basis: the exponential forecast holds where the polynomial breaks up.

calibrator, ~3.2x	LPIPS (vs uncached)	PSNR	CLIP
TaylorSeer	0.78	11.8	0.27
DMD	0.38	19.8	0.32

12 DrawBench prompts; LPIPS/PSNR are vs each method's own uncached image, CLIP is prompt alignment.

Review fixes (all three bot comments, in e1067af):

Cache the DMD fit across skip steps: the SVD/eig is fitted once per snapshot window in DMDState (_fit / _fit_key) and reused on every skipped step; only the lambda**k horizon re-advances. Verified one fit per window across N skips.
fp16 overflow after the cast: the finite check is now post-cast (a finite float64 forecast can still overflow to inf in fp16).
Batch independence: DMD fits per batch item now, so one prompt's forecast no longer depends on the others in the batch.
pre-commit (yapf / docformatter) is green.

For full disclosure, since the exponential basis invites it: I also benchmarked against Spectrum (CVPR'26, a global error-bounded Chebyshev fit) on FLUX, and it wins there (3.46x, LPIPS 0.072). Its global fit beats local forecasting of any basis on this image model, and DMD's reported wins are on flow-matching 3D generators. So the honest pitch is that DMD is a strictly better drop-in basis than the TaylorSeer calibrator already in cache-dit, not that it is SOTA on FLUX. Full numbers + scripts: RESULTS.md. Happy to mirror the three hooks to the other model ports.

DefTruth · 2026-06-14T01:12:30Z

@Archerkattri This changes is LGTM, please also add 'dmd' calibrator into example CLI and make sure it can work as expected while enable it by --dmd.

cache-dit/src/cache_dit/_utils/utils.py

Line 2198 in 33eacf7

def maybe_apply_optimization(

For example:

python -m cache_dit.generate flux # no cache
python -m cache_dit.generate flux --cache # DBCache
python -m cache_dit.generate flux --cache --taylorseer  # DBCache + Taylorseer
python -m cache_dit.generate flux --cache --dmd # DBCache + DMD

Per review: enable the DMD calibrator from `python -m cache_dit.generate` exactly like --taylorseer. --dmd selects DMDCalibratorConfig (history via --dmd-history, default 6); --taylorseer is unchanged. Verified end-to-end: python -m cache_dit.generate flux --cache --dmd --cpu-offload generates with the DMD calibrator active (optimization tag ...DMDH6_S12, image saved).

Archerkattri · 2026-06-14T06:51:23Z

Done in a0026dc. Added --dmd (and --dmd-history, default 6) to python -m cache_dit.generate, wired exactly like --taylorseer (it selects DMDCalibratorConfig; --taylorseer is unchanged, and --dmd takes precedence if both are passed).

Verified end-to-end on FLUX.1-dev, matching your example:

python -m cache_dit.generate flux                       # no cache
python -m cache_dit.generate flux --cache               # DBCache
python -m cache_dit.generate flux --cache --taylorseer  # DBCache + TaylorSeer
python -m cache_dit.generate flux --cache --dmd         # DBCache + DMD

The --dmd run generates correctly with the DMD calibrator active (optimization tag ...DMDH6_S12, image saved). pre-commit is green.

DefTruth

LGTM~ Thanks for your contribution!

Add a Dynamic Mode Decomposition (Prony) exponential-basis calibrator…

c9a6b50

… (`calibrator_type="dmd"`)

DefTruth changed the title ~~Add a Dynamic Mode Decomposition (Prony) exponential-basis calibrator (calibrator_type="dmd")~~ feat: support dmd (dynamic mode decomposition) calibrator Jun 14, 2026

DefTruth changed the title ~~feat: support dmd (dynamic mode decomposition) calibrator~~ feat: support dynamic mode decomposition calibrator Jun 14, 2026

DefTruth mentioned this pull request Jun 14, 2026

[RFC] v1.5.0 Roadmap #856

Open

11 tasks

DefTruth approved these changes Jun 14, 2026

View reviewed changes

DefTruth merged commit 6cd559c into vipshop:main Jun 14, 2026
4 checks passed

Archerkattri deleted the dmd-calibrator branch June 14, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support dynamic mode decomposition calibrator#1053

feat: support dynamic mode decomposition calibrator#1053
DefTruth merged 3 commits into
vipshop:mainfrom
Archerkattri:dmd-calibrator

Archerkattri commented Jun 13, 2026

Uh oh!

DefTruth commented Jun 13, 2026

Uh oh!

Archerkattri commented Jun 13, 2026

Uh oh!

DefTruth commented Jun 14, 2026

Uh oh!

Archerkattri commented Jun 14, 2026

Uh oh!

DefTruth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

basis	i4	i6	i8
TaylorSeer (corrected, +k)	2.27 (3.81x)	-	-
Hermite (corrected, +k)	3.54 (3.79x)	6.46 (5.46x)	10.74 (7.21x)
exponential (DMD)	18.02	54.24	100.65

Conversation

Archerkattri commented Jun 13, 2026

Add a Dynamic Mode Decomposition (Prony) exponential-basis calibrator (calibrator_type="dmd")

Motivation

What the calibrator does (math summary)

Changes

Validation so far

Uh oh!

DefTruth commented Jun 13, 2026

Uh oh!

Archerkattri commented Jun 13, 2026

Uh oh!

DefTruth commented Jun 14, 2026

Uh oh!

Archerkattri commented Jun 14, 2026

Uh oh!

DefTruth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a Dynamic Mode Decomposition (Prony) exponential-basis calibrator (`calibrator_type="dmd"`)