Skip to content

Eigenspace Tracking

Dean Howarth edited this page May 2, 2026 · 2 revisions

Eigenspace tracking is a runtime acceleration layer for QUDA's MG-preconditioned fermion solves in long-running settings such as HMC. It addresses a well-known problem: as the gauge field evolves, the multigrid null-space drifts away from the operator's true low modes, the preconditioner becomes a worse approximate inverse, and the outer Krylov iteration count climbs. The standard remedy — periodic from-scratch MG re-setup — is expensive: hundreds of CG iterations per null vector. Eigenspace tracking maintains a persistent pool of low-mode information from cheap sources (CG-Lanczos extraction, GCR residuals, converged solutions of past solves), forecasts its evolution between solves, and uses that pool to refresh the MG null vectors on demand at a small fraction of the from-scratch cost.

The framework is exposed through QudaInvertParam / QudaHMCParam flags so any caller — Chroma, tmLQCD, internal tests — can opt in without API breakage.

Contents

  1. When to use it
  2. How it works
  3. CLI parameters
  4. Example invocation
  5. Results: 24x48 Wilson HMC at light mass
  6. Limitations and when it does not help

When to use it

Eigenspace tracking pays off when:

  • The operator changes between solves (HMC trajectories, gauge cooling, anything that updates the gauge field).
  • The MG preconditioner exists but its null-space is going stale — symptom: per-solve outer-GCR iteration count rises as the chain progresses, plateauing well above its trajectory-1 value.
  • You can afford a modest pool of O(N_vec) extra spinor fields on device (24-48 vectors at the fine grid; coarse-pool variants exist where memory is tight).

It does not help if:

  • You are solving once on a static gauge field (no operator drift between solves; the persistent pool has nothing to amortise over).
  • The MG re-setup interval is already short enough that staleness never becomes a problem (then ET refresh just duplicates work the standard re-setup is doing).
  • You are using a non-MG-preconditioned solver in a regime where the un-preconditioned CG count is already low (no preconditioner to keep fresh).

How it works

Three subsystems cooperate:

1. Tracker pool (EigenTracker). A pool of N_pool orthonormal vectors approximating the low-eigenvalue subspace of the current operator, with cached operator applications for cheap Rayleigh-Ritz updates. The pool is seeded at the start of HMC from the existing MG null vectors; throughout the run it is enriched by absorption of converged-solution vectors and (optionally) per-solve Krylov vectors. A linear/quadratic forecast pre-rotates the pool toward the expected post-trajectory basis between accepted trajectories.

2. MG null-vector refresh. When the iteration-count safety net trips (--hmc-mg-setup-iter-ratio N, default 2.0× the early-trajectory baseline), or on a forced periodic interval (--hmc-mg-setup-interval N), the MG re-setup path consults the eigentracker. With --eigentracking-mg-refresh-iters N>=0 the path becomes pool-driven: copy the pool into MG's null-vector storage, run N cheap CG inverse-iter polish steps per vector, rebuild the prolongator and coarse Dirac. Cost is roughly 50× lower than QUDA's standard refresh (--mg-setup-maxiter-refresh 100) and roughly 1000× lower than a from-scratch setup.

3. Per-solve Krylov capture (opt-in). A single user-facing knob (--eigentracking-residual-cap N, default 0) gates two install paths through a shared TrackerScope<T> lifecycle wrapper (inv_tracker.h):

  • CG (non-preconditioned fallback) — zero-cost Lanczos-tridiag Ritz extraction. The install hook in inv_cg_quda.cpp records α/β and normalised residuals during the solve, after which up to N Ritz pairs of the operator are reconstructed with no extra matvecs.
  • GCR (MG-preconditioned outer solve) — residual stash. The install hook in inv_gcr_quda.cpp stores up to N normalised intermediate residuals per solve. We deliberately do not attempt Lanczos-tridiag extraction for GCR: its A·P_m = Q_m·T_m decomposition has T as the QR factor of A·P, not the Hessenberg of A in any basis we have, so recovering Ritz pairs of A would require either extra matvecs per stored vector or a non-trivial inversion of the preconditioner's effect on P.

Both paths feed their captured vectors into the same pool absorption / Rayleigh-Ritz machinery used by the converged-solution stash, so downstream behaviour (RR projection, MG refresh) is unchanged. The GCR capture path includes two transparent normalisations to keep the captured residual compatible with the pool:

  • Precision promotion. GCR runs at precision_sloppy (typically single); the pool's reference vectors and absorption kernels are at the target precision passed in by the caller (typically inv_param.cuda_prec = double). The tracker promotes each captured residual via blas::copy to the target precision before stashing — no instantiation exists for mixed-precision multiCdot between double pool and single residual.
  • MG-hierarchy filter. The hook fires for every GCR instance, including the coarse-grid GCR that the MG preconditioner runs at level 1+. Coarse residuals have a different (Ns, Nc) than the pool's fine-grid Wilson reference vectors, so they are silently dropped at recordIteration entry. Tracking the coarse spectrum is a separate concern handled by the nested-FGI's CoarseDeflationManager.

Site-subset is preserved by design — the pool is seeded from the half-site (even-parity) components of the MG null vectors, and inside a PC solve r_sloppy is also half-site, so the two match without any embedding step.

The cap is opt-in because in safe-mass Wilson regimes the converged-solution stash alone saturates the pool's useful low-mode content; the per-solve Krylov capture earns its keep at light masses approaching κ_c, large volumes, or trajectories long enough that mid-MD gauge drift matters within a single solve. The user explicitly opts in via --eigentracking-residual-cap N.

4D smoke verification. A 2-trajectory run on the 24³×48 m=−0.835 lattice with cap=4 showed Ritz absorbed=9 after trajectory 1 (1 converged solution + 4 GCR residuals × 2 γ₅-pass solves) growing to 27 cumulative and pool 48/24 after trajectory 2 — confirming the install path is live and stable end-to-end. dH values are paired-seed identical to the cap=0 baseline (the capture is doing pure preconditioner maintenance, not perturbing physics); iter count does not improve at this mass because the converged-solution stash already saturates the pool. The infrastructure is in place for the lighter-mass / larger-volume regimes where the dominant-solution-only path leaves modes on the table.

Architectural invariants. Four non-obvious requirements at the boundary between the GCR loop and the EigenTracker pool, each pinned by a regression test in the HMC.GCRTracker* suite:

Invariant Mechanism
Move-assignment guard FIFO uses "keep first N", not vector::erase (QUDA's ColorSpinorField refuses move-assignment to an already-created field as a GPU-allocation safety check)
Precision promotion blas::copy to inv_param.cuda_prec before stash (no multiCdot instantiation exists for double-pool / single-residual)
Hierarchy filter Drop residuals at recordIteration entry where (Ns, Nc) ≠ (4, 3) — the GCR install hook fires for every GCR instance including the MG-internal level-1+ coarse GCR; coarse residuals fail the length check inside the pool's MultiReduce
Site-subset preservation The pool is seeded from the even-parity components of the MG null vectors, so it's half-site fine; PC-solve r_sloppy is also half-site, which matches by construction. No half→full embedding

CLI parameters

All flags are gated by --eigentracking true. Most defaults are sized at run start from n_vec (the MG null-vector count) so a typical user only sets --eigentracking true plus the refresh flag.

Master switch

Flag Description Default
--eigentracking <bool> Enable eigenspace tracking. All other flags are no-ops if false. false

Pool sizing

Flag Description Default
--eigentracking-n-ev <N> Number of tracked eigenpairs. 0 derives from MG n_vec. 0 (auto)
--eigentracking-pool-capacity <N> Max pool size. 0 derives as 2 × n_ev. 0 (auto)
--eigentracking-n-ritz <N> Ritz pairs to extract per CG solve when CG capture is on. 0 derives as max(n_ev/2, 2). 0 (auto)

Forecasting and absorption

Flag Description Default
--eigentracking-forecast-order <0/1/2> Linear (1) / quadratic (2) extrapolation of pool basis between trajectories; 0 disables. 1
--eigentracking-solution-history <N> Chronological initial-guess history depth (Brower et al. forecasting). 3
--eigentracking-absorb-ritz <bool> Absorb extracted Ritz vectors into pool. False keeps the pool as RR-evolved MG null vectors only (Schwinger-style; useful for refresh at extreme mass). true
--eigentracking-fresh-interval <N> Trajectories between fresh TRLM rebuilds of the pool. 0 disables. 0

MG null-vector refresh

Flag Description Default
--eigentracking-mg-refresh-iters <N> Pool-driven null-vector refresh on accepted-trajectory re-setup. -1 = disabled (fall back to standard CG-based refresh). 0 = pure pool replacement (pair with --eigentracking-absorb-ritz false). N>0 = hybrid pool warm-start + N CG inverse-iter polish steps per vector; ~5 typically matches standard re-setup quality. -1

The interaction with QUDA's existing MG re-setup triggers:

Flag Description Default
--hmc-mg-setup-interval <N> Force MG refresh every N accepted trajectories. 0 disables periodic refresh. 0
--hmc-mg-setup-iter-ratio <r> Adaptive trigger: refresh when iters/solve exceeds r × baseline (baseline = average of first --hmc-mg-setup-iter-baseline-traj accepted traj). Set 0 to disable. 2.0
--hmc-mg-setup-iter-baseline-traj <N> Trajectories used to establish the iter-ratio baseline. 5

Per-solve Krylov capture (opt-in)

Flag Description Default
--eigentracking-residual-cap <N> Cap on extra ET-pool candidates contributed by each fermion solve. Single knob covering both install paths: GCR stashes up to N normalised residuals; CG extracts up to N Lanczos-tridiag Ritz pairs. Opt-in for light-mass / large-volume regimes; default off because the converged-solution stash alone saturates the pool in safe Wilson regimes. 0

Initial / fresh TRLM eigensolve

Flag Description Default
--eigentracking-trlm-tol <tol> TRLM convergence tolerance for initial / fresh-interval eigensolves. 1e-6
--eigentracking-trlm-max-restarts <N> TRLM maximum restarts. 100
--eigentracking-trlm-check-interval <N> TRLM iterations between convergence checks. 10
--eigentracking-eig-type <trlm/blktrlm/iram/blkiram> Eigensolver for initial / fresh-TRLM eigensolves. Block variants operate on multiple Lanczos vectors per step (larger arithmetic intensity). trlm
--eigentracking-blk-size <N> Block size for block solvers; n_ev rounded up to a multiple. 4
--eigentracking-use-poly-acc <bool> Chebyshev polynomial acceleration in initial TRLM (recommended for clustered or near-zero M†M spectra; required at critical mass without MG seeding). false
--eigentracking-poly-deg <N> Chebyshev polynomial degree. 50
--eigentracking-a-min <a> Lower suppression boundary for Chebyshev acceleration; set roughly 10× the smallest target eigenvalue. 0.0
--eigentracking-a-max <a> Upper boundary; 0 triggers QUDA's power-iteration auto-estimate. 0.0

Example invocation

A typical light-mass HMC run with eigenspace tracking, MG refresh on every accepted trajectory using the ET pool, and per-solve GCR-residual capture turned on:

./tests/hmc_test --gtest_filter="HMC.Production" \
    --dim 24 24 24 48 \
    --prec double --prec-sloppy single --prec-precondition single \
    --prec-null single --prec-eigensolver single --prec-refine single \
    --mg-smoother-halo-prec single \
    --dslash-type wilson --mass -0.835 \
    --inv-type gcr --solution-type mat-pc --solve-type direct-pc --matpc even-even \
    --niter 5000 --tol 1e-9 \
    --hmc-integrator 3 --hmc-tau 1.0 --hmc-n-steps 80 --hmc-beta 5.6 \
    --inv-multigrid true --mg-levels 2 --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
    --mg-setup-inv 0 cgnr --mg-setup-ca-basis-type 0 chebyshev \
    --mg-setup-iters 0 2 --mg-setup-tol 0 1e-6 \
    --mg-setup-maxiter 0 2000 --mg-setup-maxiter-refresh 0 100 \
    --eigentracking true \
    --eigentracking-mg-refresh-iters 2 \
    --eigentracking-residual-cap 4 \
    --hmc-mg-setup-interval 1 \
    --hmc-mg-setup-iter-ratio 2.0 \
    --hmc-mg-setup-iter-baseline-traj 5 \
    --fermion-t-boundary anti-periodic \
    --hmc-gauge-infile path/to/start.lime \
    --hmc-n-trajectories 20 \
    --hmc-momentum-seed 67890

The same invocation with --eigentracking false and the four --eigentracking-* lines removed gives a baseline against which to measure the iteration / wall-clock saving.

Results: 24x48 Wilson HMC at light mass

Test setup: 24^3 × 48, β=5.6, m=-0.835 (κ=0.158, m_π ≈ 410 MeV, anti-periodic in time), thermalised from unit gauge through a sequence of mass steps ending at the lightest mass point of the SESAM ensemble. Integrator is nested force-gradient (PQPQP-FG outer, leapfrog inner) with n_outer=80 at τ=1, calibrated to give 100% acceptance on a smooth gauge (<dH> ≈ 0.07). MG: two-level, 4^4 block size, 24 null vectors, GCR outer with tol=1e-9 in mixed double/single precision. Both runs use the same starting gauge, the same momentum seed, and the same MG configuration; only the eigentracker is toggled.

The two MG-update modes compared:

  • Baseline — thin update of the coarse Dirac operator after every accepted trajectory (the cheapest configuration that still keeps the coarse operator in sync with the new gauge); null vectors and prolongator frozen.
  • Tracked — the same thin update plus a pool-driven refresh of the 24 null vectors after every accepted trajectory (--hmc-mg-setup-interval 1 --eigentracking-mg-refresh-iters 2). The pool supplies the seed; 2 CG inverse-iter polish steps per vector; prolongator and coarse Dirac rebuilt.

Five paired trajectories:

Metric Baseline (no ET) Tracked (pool refresh per traj) Δ
Wall, 5 trajectories 3804 s 2776 s −27%
Wall per trajectory 12.7 min 9.3 min −3.4 min
Mean iters/solve, traj 1 22.5 22.5 identical
Mean iters/solve, traj 2 26.8 22.5 −16%
Mean iters/solve, traj 3 28.5 22.2 −22%
Mean iters/solve, traj 4 29.5 22.2 −25%
Mean iters/solve, traj 5 29.5 22.2 −25%
Maximum iter inflation 1.46× 1.07× held to baseline
Acceptance (paired seed) 3/5 3/5 identical (same seed → same accept decisions)
<dH> +0.076 +0.073 tiny improvement

The baseline iteration count climbs from 22 to a plateau near 30 — that is the gauge wandering away from the operator the prolongator was built for. The tracked configuration is bounded at 1.07× over the same MD evolution. The 27% wall saving is net of the per-trajectory pool-refresh overhead.

ET internal accounting from the tracked run:

Trajectory 1:  pool seeded from 24 MG null vectors, +1 absorbed Ritz vector → pool=25
Trajectory 2:  +2 absorbed → pool=27
Trajectory 3:  +2 absorbed → pool=29
              (refresh consumes the first 24 each accept)

The two runs share the same RNG seed for both momentum sampling and Metropolis acceptance, so accept/reject decisions agree trajectory-by-trajectory; <dH> differs only in the third decimal because the better-conditioned solve in the tracked run gives slightly tighter forces. The eigentracker is doing pure preconditioner maintenance, not perturbing the underlying physics.

Limitations and when it does not help

  • Non-MG settings. ET's primary value is keeping the MG null vectors fresh. Without an MG preconditioner there is no null-vector refresh path; the tracker still accumulates the pool and the --eigentracking-residual-cap knob still works, but the absorbed vectors only feed the next-trajectory chronological initial guess and the pool's RR-evolved basis — useful but a smaller delta than the MG-refresh path.
  • Static gauge. With no operator drift between solves the persistent pool has nothing to amortise over. Use QUDA's standard EigCG / FGMRES-DR for static-gauge multi-RHS deflation instead.
  • Non-Wilson actions. The HMC scaffolding currently provides force kernels for Wilson only. The eigentracker itself is action-agnostic; in non-HMC settings (gauge-only updates, Chroma-driven HMC) it works on whatever Dirac operator the caller installs. Production use of QUDA's HMC for staggered / twisted-mass / domain-wall is out of scope for the present release.
  • Multi-shift / RHMC. The current implementation tracks one shift at a time. Multi-shift CG / RHMC integration (a single tracker that captures Ritz information for all shifts in one solve) is a planned follow-up.
  • Staggered KD coarse op. The pool refresh path assumes fine-grid null vectors that can be replaced by pool entries. The Kahler-Dirac transfer used for staggered fermions has no fine null vectors of that form; pool-driven refresh is currently a no-op for staggered KD. The standard KD reset path (mg->mg->resetStaggeredKD(...)) handles those operators.

Clone this wiki locally