Eigenspace Tracking

Eigenspace tracking is a runtime acceleration layer for QUDA's MG-preconditioned fermion solves in long-running settings such as HMC. It addresses a well-known problem: as the gauge field evolves, the multigrid null-space drifts away from the operator's true low modes, the preconditioner becomes a worse approximate inverse, and the outer Krylov iteration count climbs. The standard remedy — periodic from-scratch MG re-setup — is expensive: hundreds of CG iterations per null vector. Eigenspace tracking maintains a persistent pool of low-mode information from cheap sources (CG-Lanczos extraction, GCR residuals, converged solutions of past solves), forecasts its evolution between solves, and uses that pool to refresh the MG null vectors on demand at a small fraction of the from-scratch cost.

The framework is exposed through QudaInvertParam / QudaHMCParam flags so any caller — Chroma, tmLQCD, internal tests — can opt in without API breakage.

When to use it

Eigenspace tracking pays off when:

The operator changes between solves (HMC trajectories, gauge cooling, anything that updates the gauge field).
The MG preconditioner exists but its null-space is going stale — symptom: per-solve outer-GCR iteration count rises as the chain progresses, plateauing well above its trajectory-1 value.
You can afford a modest pool of O(N_vec) extra spinor fields on device (24-48 vectors at the fine grid; coarse-pool variants exist where memory is tight).

It does not help if:

You are solving once on a static gauge field (no operator drift between solves; the persistent pool has nothing to amortise over).
The MG re-setup interval is already short enough that staleness never becomes a problem (then ET refresh just duplicates work the standard re-setup is doing).
You are using a non-MG-preconditioned solver in a regime where the un-preconditioned CG count is already low (no preconditioner to keep fresh).

How it works

Three subsystems cooperate:

1. Tracker pool (EigenTracker). A pool of N_pool orthonormal vectors approximating the low-eigenvalue subspace of the current operator, with cached operator applications for cheap Rayleigh-Ritz updates. The pool is seeded at the start of HMC from the existing MG null vectors; throughout the run it is enriched by absorption of converged-solution vectors and (optionally) per-solve Krylov vectors. A linear/quadratic forecast pre-rotates the pool toward the expected post-trajectory basis between accepted trajectories.

2. MG null-vector refresh. When the iteration-count safety net trips (--hmc-mg-setup-iter-ratio N, default 2.0× the early-trajectory baseline), or on a forced periodic interval (--hmc-mg-setup-interval N), the MG re-setup path consults the eigentracker. With --eigentracking-mg-refresh-iters N>=0 the path becomes pool-driven: copy the pool into MG's null-vector storage, run N cheap CG inverse-iter polish steps per vector, rebuild the prolongator and coarse Dirac. Cost is roughly 50× lower than QUDA's standard refresh (--mg-setup-maxiter-refresh 100) and roughly 1000× lower than a from-scratch setup.

3. Per-solve Krylov capture (opt-in). A single user-facing knob (--eigentracking-residual-cap N, default 0) gates two install paths through a shared TrackerScope<T> lifecycle wrapper (inv_tracker.h):

CG (non-preconditioned fallback) — zero-cost Lanczos-tridiag Ritz extraction. The install hook in inv_cg_quda.cpp records α/β and normalised residuals during the solve, after which up to N Ritz pairs of the operator are reconstructed with no extra matvecs.
GCR (MG-preconditioned outer solve) — residual stash. The install hook in inv_gcr_quda.cpp stores up to N normalised intermediate residuals per solve. We deliberately do not attempt Lanczos-tridiag extraction for GCR: its A·P_m = Q_m·T_m decomposition has T as the QR factor of A·P, not the Hessenberg of A in any basis we have, so recovering Ritz pairs of A would require either extra matvecs per stored vector or a non-trivial inversion of the preconditioner's effect on P.

Both paths feed their captured vectors into the same pool absorption / Rayleigh-Ritz machinery used by the converged-solution stash, so downstream behaviour (RR projection, MG refresh) is unchanged. The GCR capture path includes two transparent normalisations to keep the captured residual compatible with the pool:

Precision promotion. GCR runs at precision_sloppy (typically single); the pool's reference vectors and absorption kernels are at the target precision passed in by the caller (typically inv_param.cuda_prec = double). The tracker promotes each captured residual via blas::copy to the target precision before stashing — no instantiation exists for mixed-precision multiCdot between double pool and single residual.
MG-hierarchy filter. The hook fires for every GCR instance, including the coarse-grid GCR that the MG preconditioner runs at level 1+. Coarse residuals have a different (Ns, Nc) than the pool's fine-grid Wilson reference vectors, so they are silently dropped at recordIteration entry. Tracking the coarse spectrum is a separate concern handled by the nested-FGI's CoarseDeflationManager.

Site-subset is preserved by design — the pool is seeded from the half-site (even-parity) components of the MG null vectors, and inside a PC solve r_sloppy is also half-site, so the two match without any embedding step.

The cap is opt-in because in safe-mass Wilson regimes the converged-solution stash alone saturates the pool's useful low-mode content; the per-solve Krylov capture earns its keep at light masses approaching κ_c, large volumes, or trajectories long enough that mid-MD gauge drift matters within a single solve. The user explicitly opts in via --eigentracking-residual-cap N.

4D smoke verification. A 2-trajectory run on the 24³×48 m=−0.835 lattice with cap=4 showed Ritz absorbed=9 after trajectory 1 (1 converged solution + 4 GCR residuals × 2 γ₅-pass solves) growing to 27 cumulative and pool 48/24 after trajectory 2 — confirming the install path is live and stable end-to-end. dH values are paired-seed identical to the cap=0 baseline (the capture is doing pure preconditioner maintenance, not perturbing physics); iter count does not improve at this mass because the converged-solution stash already saturates the pool. The infrastructure is in place for the lighter-mass / larger-volume regimes where the dominant-solution-only path leaves modes on the table.

Architectural invariants. Four non-obvious requirements at the boundary between the GCR loop and the EigenTracker pool, each pinned by a regression test in the HMC.GCRTracker* suite:

Invariant	Mechanism
Move-assignment guard	FIFO uses "keep first N", not `vector::erase` (QUDA's `ColorSpinorField` refuses move-assignment to an already-created field as a GPU-allocation safety check)
Precision promotion	`blas::copy` to `inv_param.cuda_prec` before stash (no `multiCdot` instantiation exists for double-pool / single-residual)
Hierarchy filter	Drop residuals at `recordIteration` entry where `(Ns, Nc) ≠ (4, 3)` — the GCR install hook fires for every `GCR` instance including the MG-internal level-1+ coarse GCR; coarse residuals fail the length check inside the pool's `MultiReduce`
Site-subset preservation	The pool is seeded from the even-parity components of the MG null vectors, so it's half-site fine; PC-solve `r_sloppy` is also half-site, which matches by construction. No half→full embedding

CLI parameters

All flags are gated by --eigentracking true. Most defaults are sized at run start from n_vec (the MG null-vector count) so a typical user only sets --eigentracking true plus the refresh flag.

Master switch

Flag	Description	Default
`--eigentracking <bool>`	Enable eigenspace tracking. All other flags are no-ops if false.	`false`

Pool sizing

Flag	Description	Default
`--eigentracking-n-ev <N>`	Number of tracked eigenpairs. 0 derives from MG `n_vec`.	`0` (auto)
`--eigentracking-pool-capacity <N>`	Max pool size. 0 derives as `2 × n_ev`.	`0` (auto)
`--eigentracking-n-ritz <N>`	Ritz pairs to extract per CG solve when CG capture is on. 0 derives as `max(n_ev/2, 2)`.	`0` (auto)

Forecasting and absorption

Flag	Description	Default
`--eigentracking-forecast-order <0/1/2>`	Linear (1) / quadratic (2) extrapolation of pool basis between trajectories; 0 disables.	`1`
`--eigentracking-solution-history <N>`	Chronological initial-guess history depth (Brower et al. forecasting).	`3`
`--eigentracking-absorb-ritz <bool>`	Absorb extracted Ritz vectors into pool. False keeps the pool as RR-evolved MG null vectors only (Schwinger-style; useful for refresh at extreme mass).	`true`
`--eigentracking-fresh-interval <N>`	Trajectories between fresh TRLM rebuilds of the pool. 0 disables.	`0`

MG null-vector refresh

Flag	Description	Default
`--eigentracking-mg-refresh-iters <N>`	Pool-driven null-vector refresh on accepted-trajectory re-setup. `-1` = disabled (fall back to standard CG-based refresh). `0` = pure pool replacement (pair with `--eigentracking-absorb-ritz false`). `N>0` = hybrid pool warm-start + `N` CG inverse-iter polish steps per vector; `~5` typically matches standard re-setup quality.	`-1`

The interaction with QUDA's existing MG re-setup triggers:

Flag	Description	Default
`--hmc-mg-setup-interval <N>`	Force MG refresh every `N` accepted trajectories. 0 disables periodic refresh.	`0`
`--hmc-mg-setup-iter-ratio <r>`	Adaptive trigger: refresh when iters/solve exceeds `r ×` baseline (baseline = average of first `--hmc-mg-setup-iter-baseline-traj` accepted traj). Set 0 to disable.	`2.0`
`--hmc-mg-setup-iter-baseline-traj <N>`	Trajectories used to establish the iter-ratio baseline.	`5`

Per-solve Krylov capture (opt-in)

Flag	Description	Default
`--eigentracking-residual-cap <N>`	Cap on extra ET-pool candidates contributed by each fermion solve. Single knob covering both install paths: GCR stashes up to `N` normalised residuals; CG extracts up to `N` Lanczos-tridiag Ritz pairs. Opt-in for light-mass / large-volume regimes; default off because the converged-solution stash alone saturates the pool in safe Wilson regimes.	`0`

Initial / fresh TRLM eigensolve

Flag	Description	Default
`--eigentracking-trlm-tol <tol>`	TRLM convergence tolerance for initial / fresh-interval eigensolves.	`1e-6`
`--eigentracking-trlm-max-restarts <N>`	TRLM maximum restarts.	`100`
`--eigentracking-trlm-check-interval <N>`	TRLM iterations between convergence checks.	`10`
`--eigentracking-eig-type <trlm/blktrlm/iram/blkiram>`	Eigensolver for initial / fresh-TRLM eigensolves. Block variants operate on multiple Lanczos vectors per step (larger arithmetic intensity).	`trlm`
`--eigentracking-blk-size <N>`	Block size for block solvers; `n_ev` rounded up to a multiple.	`4`
`--eigentracking-use-poly-acc <bool>`	Chebyshev polynomial acceleration in initial TRLM (recommended for clustered or near-zero `M†M` spectra; required at critical mass without MG seeding).	`false`
`--eigentracking-poly-deg <N>`	Chebyshev polynomial degree.	`50`
`--eigentracking-a-min <a>`	Lower suppression boundary for Chebyshev acceleration; set roughly `10×` the smallest target eigenvalue.	`0.0`
`--eigentracking-a-max <a>`	Upper boundary; `0` triggers QUDA's power-iteration auto-estimate.	`0.0`

Example invocation

A typical light-mass HMC run with eigenspace tracking, MG refresh on every accepted trajectory using the ET pool, and per-solve GCR-residual capture turned on:

./tests/hmc_test --gtest_filter="HMC.Production" \
    --dim 24 24 24 48 \
    --prec double --prec-sloppy single --prec-precondition single \
    --prec-null single --prec-eigensolver single --prec-refine single \
    --mg-smoother-halo-prec single \
    --dslash-type wilson --mass -0.835 \
    --inv-type gcr --solution-type mat-pc --solve-type direct-pc --matpc even-even \
    --niter 5000 --tol 1e-9 \
    --hmc-integrator 3 --hmc-tau 1.0 --hmc-n-steps 80 --hmc-beta 5.6 \
    --inv-multigrid true --mg-levels 2 --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
    --mg-setup-inv 0 cgnr --mg-setup-ca-basis-type 0 chebyshev \
    --mg-setup-iters 0 2 --mg-setup-tol 0 1e-6 \
    --mg-setup-maxiter 0 2000 --mg-setup-maxiter-refresh 0 100 \
    --eigentracking true \
    --eigentracking-mg-refresh-iters 2 \
    --eigentracking-residual-cap 4 \
    --hmc-mg-setup-interval 1 \
    --hmc-mg-setup-iter-ratio 2.0 \
    --hmc-mg-setup-iter-baseline-traj 5 \
    --fermion-t-boundary anti-periodic \
    --hmc-gauge-infile path/to/start.lime \
    --hmc-n-trajectories 20 \
    --hmc-momentum-seed 67890

The same invocation with --eigentracking false and the four --eigentracking-* lines removed gives a baseline against which to measure the iteration / wall-clock saving.

Results: 24x48 Wilson HMC at light mass

Test setup: 24^3 × 48, β=5.6, m=-0.835 (κ=0.158, m_π ≈ 410 MeV, anti-periodic in time), thermalised from unit gauge through a sequence of mass steps ending at the lightest mass point of the SESAM ensemble. Integrator is nested force-gradient (PQPQP-FG outer, leapfrog inner) with n_outer=80 at τ=1, calibrated to give 100% acceptance on a smooth gauge (<dH> ≈ 0.07). MG: two-level, 4^4 block size, 24 null vectors, GCR outer with tol=1e-9 in mixed double/single precision. Both runs use the same starting gauge, the same momentum seed, and the same MG configuration; only the eigentracker is toggled.

The two MG-update modes compared:

Baseline — thin update of the coarse Dirac operator after every accepted trajectory (the cheapest configuration that still keeps the coarse operator in sync with the new gauge); null vectors and prolongator frozen.
Tracked — the same thin update plus a pool-driven refresh of the 24 null vectors after every accepted trajectory (--hmc-mg-setup-interval 1 --eigentracking-mg-refresh-iters 2). The pool supplies the seed; 2 CG inverse-iter polish steps per vector; prolongator and coarse Dirac rebuilt.

Five paired trajectories:

Metric	Baseline (no ET)	Tracked (pool refresh per traj)	Δ
Wall, 5 trajectories	3804 s	2776 s	−27%
Wall per trajectory	12.7 min	9.3 min	−3.4 min
Mean iters/solve, traj 1	22.5	22.5	identical
Mean iters/solve, traj 2	26.8	22.5	−16%
Mean iters/solve, traj 3	28.5	22.2	−22%
Mean iters/solve, traj 4	29.5	22.2	−25%
Mean iters/solve, traj 5	29.5	22.2	−25%
Maximum iter inflation	`1.46×`	`1.07×`	held to baseline
Acceptance (paired seed)	3/5	3/5	identical (same seed → same accept decisions)
`<dH>`	`+0.076`	`+0.073`	tiny improvement

The baseline iteration count climbs from 22 to a plateau near 30 — that is the gauge wandering away from the operator the prolongator was built for. The tracked configuration is bounded at 1.07× over the same MD evolution. The 27% wall saving is net of the per-trajectory pool-refresh overhead.

ET internal accounting from the tracked run:

Trajectory 1:  pool seeded from 24 MG null vectors, +1 absorbed Ritz vector → pool=25
Trajectory 2:  +2 absorbed → pool=27
Trajectory 3:  +2 absorbed → pool=29
              (refresh consumes the first 24 each accept)

The two runs share the same RNG seed for both momentum sampling and Metropolis acceptance, so accept/reject decisions agree trajectory-by-trajectory; <dH> differs only in the third decimal because the better-conditioned solve in the tracked run gives slightly tighter forces. The eigentracker is doing pure preconditioner maintenance, not perturbing the underlying physics.

Limitations and when it does not help

Non-MG settings. ET's primary value is keeping the MG null vectors fresh. Without an MG preconditioner there is no null-vector refresh path; the tracker still accumulates the pool and the --eigentracking-residual-cap knob still works, but the absorbed vectors only feed the next-trajectory chronological initial guess and the pool's RR-evolved basis — useful but a smaller delta than the MG-refresh path.
Static gauge. With no operator drift between solves the persistent pool has nothing to amortise over. Use QUDA's standard EigCG / FGMRES-DR for static-gauge multi-RHS deflation instead.
Non-Wilson actions. The HMC scaffolding currently provides force kernels for Wilson only. The eigentracker itself is action-agnostic; in non-HMC settings (gauge-only updates, Chroma-driven HMC) it works on whatever Dirac operator the caller installs. Production use of QUDA's HMC for staggered / twisted-mass / domain-wall is out of scope for the present release.
Multi-shift / RHMC. The current implementation tracks one shift at a time. Multi-shift CG / RHMC integration (a single tracker that captures Ritz information for all shifts in one solve) is a planned follow-up.
Staggered KD coarse op. The pool refresh path assumes fine-grid null vectors that can be replaced by pool entries. The Kahler-Dirac transfer used for staggered fermions has no fine null vectors of that form; pool-driven refresh is currently a no-op for staggered KD. The standard KD reset path (mg->mg->resetStaggeredKD(...)) handles those operators.

QUDA calls

Eigenspace Tracking

Contents

When to use it

How it works

CLI parameters

Master switch

Pool sizing

Forecasting and absorption

MG null-vector refresh

Per-solve Krylov capture (opt-in)

Initial / fresh TRLM eigensolve

Example invocation

Results: 24x48 Wilson HMC at light mass

Limitations and when it does not help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!