-
Notifications
You must be signed in to change notification settings - Fork 113
Eigenspace Tracking
Eigenspace tracking is a runtime acceleration layer for QUDA's MG-preconditioned fermion solves in long-running settings such as HMC. It addresses a well-known problem: as the gauge field evolves, the multigrid null-space drifts away from the operator's true low modes, the preconditioner becomes a worse approximate inverse, and the outer Krylov iteration count climbs. The standard remedy — periodic from-scratch MG re-setup — is expensive: hundreds of CG iterations per null vector. Eigenspace tracking maintains a persistent pool of low-mode information from cheap sources (CG-Lanczos extraction, GCR residuals, converged solutions of past solves), forecasts its evolution between solves, and uses that pool to refresh the MG null vectors on demand at a small fraction of the from-scratch cost.
The framework is exposed through QudaInvertParam / QudaHMCParam flags so any caller — Chroma, tmLQCD, internal tests — can opt in without API breakage.
- When to use it
- How it works
- CLI parameters
- Example invocation
- Results: 24x48 Wilson HMC at light mass
- Limitations and when it does not help
Eigenspace tracking pays off when:
- The operator changes between solves (HMC trajectories, gauge cooling, anything that updates the gauge field).
- The MG preconditioner exists but its null-space is going stale — symptom: per-solve outer-GCR iteration count rises as the chain progresses, plateauing well above its trajectory-1 value.
- You can afford a modest pool of
O(N_vec)extra spinor fields on device (24-48 vectors at the fine grid; coarse-pool variants exist where memory is tight).
It does not help if:
- You are solving once on a static gauge field (no operator drift between solves; the persistent pool has nothing to amortise over).
- The MG re-setup interval is already short enough that staleness never becomes a problem (then ET refresh just duplicates work the standard re-setup is doing).
- You are using a non-MG-preconditioned solver in a regime where the un-preconditioned CG count is already low (no preconditioner to keep fresh).
Three subsystems cooperate:
1. Tracker pool (EigenTracker). A pool of N_pool orthonormal vectors approximating the low-eigenvalue subspace of the current operator, with cached operator applications for cheap Rayleigh-Ritz updates. The pool is seeded at the start of HMC from the existing MG null vectors; throughout the run it is enriched by absorption of converged-solution vectors and (optionally) per-solve Krylov vectors. A linear/quadratic forecast pre-rotates the pool toward the expected post-trajectory basis between accepted trajectories.
2. MG null-vector refresh. When the iteration-count safety net trips (--hmc-mg-setup-iter-ratio N, default 2.0× the early-trajectory baseline), or on a forced periodic interval (--hmc-mg-setup-interval N), the MG re-setup path consults the eigentracker. With --eigentracking-mg-refresh-iters N>=0 the path becomes pool-driven: copy the pool into MG's null-vector storage, run N cheap CG inverse-iter polish steps per vector, rebuild the prolongator and coarse Dirac. Cost is roughly 50× lower than QUDA's standard refresh (--mg-setup-maxiter-refresh 100) and roughly 1000× lower than a from-scratch setup.
3. Per-solve Krylov capture (opt-in). A single user-facing knob (--eigentracking-residual-cap N, default 0) gates two install paths through a shared TrackerScope<T> lifecycle wrapper (inv_tracker.h):
-
CG (non-preconditioned fallback) — zero-cost Lanczos-tridiag Ritz extraction. The install hook in
inv_cg_quda.cpprecordsα/βand normalised residuals during the solve, after which up toNRitz pairs of the operator are reconstructed with no extra matvecs. -
GCR (MG-preconditioned outer solve) — residual stash. The install hook in
inv_gcr_quda.cppstores up toNnormalised intermediate residuals per solve. We deliberately do not attempt Lanczos-tridiag extraction for GCR: itsA·P_m = Q_m·T_mdecomposition hasTas the QR factor ofA·P, not the Hessenberg ofAin any basis we have, so recovering Ritz pairs ofAwould require either extra matvecs per stored vector or a non-trivial inversion of the preconditioner's effect onP.
Both paths feed their captured vectors into the same pool absorption / Rayleigh-Ritz machinery used by the converged-solution stash, so downstream behaviour (RR projection, MG refresh) is unchanged. The GCR capture path includes two transparent normalisations to keep the captured residual compatible with the pool:
-
Precision promotion. GCR runs at
precision_sloppy(typically single); the pool's reference vectors and absorption kernels are at the target precision passed in by the caller (typicallyinv_param.cuda_prec= double). The tracker promotes each captured residual viablas::copyto the target precision before stashing — no instantiation exists for mixed-precisionmultiCdotbetween double pool and single residual. -
MG-hierarchy filter. The hook fires for every
GCRinstance, including the coarse-grid GCR that the MG preconditioner runs at level 1+. Coarse residuals have a different(Ns, Nc)than the pool's fine-grid Wilson reference vectors, so they are silently dropped atrecordIterationentry. Tracking the coarse spectrum is a separate concern handled by the nested-FGI'sCoarseDeflationManager.
Site-subset is preserved by design — the pool is seeded from the half-site (even-parity) components of the MG null vectors, and inside a PC solve r_sloppy is also half-site, so the two match without any embedding step.
The cap is opt-in because in safe-mass Wilson regimes the converged-solution stash alone saturates the pool's useful low-mode content; the per-solve Krylov capture earns its keep at light masses approaching κ_c, large volumes, or trajectories long enough that mid-MD gauge drift matters within a single solve. The user explicitly opts in via --eigentracking-residual-cap N.
4D smoke verification. A 2-trajectory run on the 24³×48 m=−0.835 lattice with cap=4 showed Ritz absorbed=9 after trajectory 1 (1 converged solution + 4 GCR residuals × 2 γ₅-pass solves) growing to 27 cumulative and pool 48/24 after trajectory 2 — confirming the install path is live and stable end-to-end. dH values are paired-seed identical to the cap=0 baseline (the capture is doing pure preconditioner maintenance, not perturbing physics); iter count does not improve at this mass because the converged-solution stash already saturates the pool. The infrastructure is in place for the lighter-mass / larger-volume regimes where the dominant-solution-only path leaves modes on the table.
Architectural invariants. Four non-obvious requirements at the boundary between the GCR loop and the EigenTracker pool, each pinned by a regression test in the HMC.GCRTracker* suite:
| Invariant | Mechanism |
|---|---|
| Move-assignment guard | FIFO uses "keep first N", not vector::erase (QUDA's ColorSpinorField refuses move-assignment to an already-created field as a GPU-allocation safety check) |
| Precision promotion |
blas::copy to inv_param.cuda_prec before stash (no multiCdot instantiation exists for double-pool / single-residual) |
| Hierarchy filter | Drop residuals at recordIteration entry where (Ns, Nc) ≠ (4, 3) — the GCR install hook fires for every GCR instance including the MG-internal level-1+ coarse GCR; coarse residuals fail the length check inside the pool's MultiReduce
|
| Site-subset preservation | The pool is seeded from the even-parity components of the MG null vectors, so it's half-site fine; PC-solve r_sloppy is also half-site, which matches by construction. No half→full embedding |
All flags are gated by --eigentracking true. Most defaults are sized at run start from n_vec (the MG null-vector count) so a typical user only sets --eigentracking true plus the refresh flag.
| Flag | Description | Default |
|---|---|---|
--eigentracking <bool> |
Enable eigenspace tracking. All other flags are no-ops if false. | false |
| Flag | Description | Default |
|---|---|---|
--eigentracking-n-ev <N> |
Number of tracked eigenpairs. 0 derives from MG n_vec. |
0 (auto) |
--eigentracking-pool-capacity <N> |
Max pool size. 0 derives as 2 × n_ev. |
0 (auto) |
--eigentracking-n-ritz <N> |
Ritz pairs to extract per CG solve when CG capture is on. 0 derives as max(n_ev/2, 2). |
0 (auto) |
| Flag | Description | Default |
|---|---|---|
--eigentracking-forecast-order <0/1/2> |
Linear (1) / quadratic (2) extrapolation of pool basis between trajectories; 0 disables. | 1 |
--eigentracking-solution-history <N> |
Chronological initial-guess history depth (Brower et al. forecasting). | 3 |
--eigentracking-absorb-ritz <bool> |
Absorb extracted Ritz vectors into pool. False keeps the pool as RR-evolved MG null vectors only (Schwinger-style; useful for refresh at extreme mass). | true |
--eigentracking-fresh-interval <N> |
Trajectories between fresh TRLM rebuilds of the pool. 0 disables. | 0 |
| Flag | Description | Default |
|---|---|---|
--eigentracking-mg-refresh-iters <N> |
Pool-driven null-vector refresh on accepted-trajectory re-setup. -1 = disabled (fall back to standard CG-based refresh). 0 = pure pool replacement (pair with --eigentracking-absorb-ritz false). N>0 = hybrid pool warm-start + N CG inverse-iter polish steps per vector; ~5 typically matches standard re-setup quality. |
-1 |
The interaction with QUDA's existing MG re-setup triggers:
| Flag | Description | Default |
|---|---|---|
--hmc-mg-setup-interval <N> |
Force MG refresh every N accepted trajectories. 0 disables periodic refresh. |
0 |
--hmc-mg-setup-iter-ratio <r> |
Adaptive trigger: refresh when iters/solve exceeds r × baseline (baseline = average of first --hmc-mg-setup-iter-baseline-traj accepted traj). Set 0 to disable. |
2.0 |
--hmc-mg-setup-iter-baseline-traj <N> |
Trajectories used to establish the iter-ratio baseline. | 5 |
| Flag | Description | Default |
|---|---|---|
--eigentracking-residual-cap <N> |
Cap on extra ET-pool candidates contributed by each fermion solve. Single knob covering both install paths: GCR stashes up to N normalised residuals; CG extracts up to N Lanczos-tridiag Ritz pairs. Opt-in for light-mass / large-volume regimes; default off because the converged-solution stash alone saturates the pool in safe Wilson regimes. |
0 |
| Flag | Description | Default |
|---|---|---|
--eigentracking-trlm-tol <tol> |
TRLM convergence tolerance for initial / fresh-interval eigensolves. | 1e-6 |
--eigentracking-trlm-max-restarts <N> |
TRLM maximum restarts. | 100 |
--eigentracking-trlm-check-interval <N> |
TRLM iterations between convergence checks. | 10 |
--eigentracking-eig-type <trlm/blktrlm/iram/blkiram> |
Eigensolver for initial / fresh-TRLM eigensolves. Block variants operate on multiple Lanczos vectors per step (larger arithmetic intensity). | trlm |
--eigentracking-blk-size <N> |
Block size for block solvers; n_ev rounded up to a multiple. |
4 |
--eigentracking-use-poly-acc <bool> |
Chebyshev polynomial acceleration in initial TRLM (recommended for clustered or near-zero M†M spectra; required at critical mass without MG seeding). |
false |
--eigentracking-poly-deg <N> |
Chebyshev polynomial degree. | 50 |
--eigentracking-a-min <a> |
Lower suppression boundary for Chebyshev acceleration; set roughly 10× the smallest target eigenvalue. |
0.0 |
--eigentracking-a-max <a> |
Upper boundary; 0 triggers QUDA's power-iteration auto-estimate. |
0.0 |
A typical light-mass HMC run with eigenspace tracking, MG refresh on every accepted trajectory using the ET pool, and per-solve GCR-residual capture turned on:
./tests/hmc_test --gtest_filter="HMC.Production" \
--dim 24 24 24 48 \
--prec double --prec-sloppy single --prec-precondition single \
--prec-null single --prec-eigensolver single --prec-refine single \
--mg-smoother-halo-prec single \
--dslash-type wilson --mass -0.835 \
--inv-type gcr --solution-type mat-pc --solve-type direct-pc --matpc even-even \
--niter 5000 --tol 1e-9 \
--hmc-integrator 3 --hmc-tau 1.0 --hmc-n-steps 80 --hmc-beta 5.6 \
--inv-multigrid true --mg-levels 2 --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
--mg-setup-inv 0 cgnr --mg-setup-ca-basis-type 0 chebyshev \
--mg-setup-iters 0 2 --mg-setup-tol 0 1e-6 \
--mg-setup-maxiter 0 2000 --mg-setup-maxiter-refresh 0 100 \
--eigentracking true \
--eigentracking-mg-refresh-iters 2 \
--eigentracking-residual-cap 4 \
--hmc-mg-setup-interval 1 \
--hmc-mg-setup-iter-ratio 2.0 \
--hmc-mg-setup-iter-baseline-traj 5 \
--fermion-t-boundary anti-periodic \
--hmc-gauge-infile path/to/start.lime \
--hmc-n-trajectories 20 \
--hmc-momentum-seed 67890
The same invocation with --eigentracking false and the four --eigentracking-* lines removed gives a baseline against which to measure the iteration / wall-clock saving.
Test setup: 24^3 × 48, β=5.6, m=-0.835 (κ=0.158, m_π ≈ 410 MeV, anti-periodic in time), thermalised from unit gauge through a sequence of mass steps ending at the lightest mass point of the SESAM ensemble. Integrator is nested force-gradient (PQPQP-FG outer, leapfrog inner) with n_outer=80 at τ=1, calibrated to give 100% acceptance on a smooth gauge (<dH> ≈ 0.07). MG: two-level, 4^4 block size, 24 null vectors, GCR outer with tol=1e-9 in mixed double/single precision. Both runs use the same starting gauge, the same momentum seed, and the same MG configuration; only the eigentracker is toggled.
The two MG-update modes compared:
- Baseline — thin update of the coarse Dirac operator after every accepted trajectory (the cheapest configuration that still keeps the coarse operator in sync with the new gauge); null vectors and prolongator frozen.
-
Tracked — the same thin update plus a pool-driven refresh of the 24 null vectors after every accepted trajectory (
--hmc-mg-setup-interval 1 --eigentracking-mg-refresh-iters 2). The pool supplies the seed; 2 CG inverse-iter polish steps per vector; prolongator and coarse Dirac rebuilt.
Five paired trajectories:
| Metric | Baseline (no ET) | Tracked (pool refresh per traj) | Δ |
|---|---|---|---|
| Wall, 5 trajectories | 3804 s | 2776 s | −27% |
| Wall per trajectory | 12.7 min | 9.3 min | −3.4 min |
| Mean iters/solve, traj 1 | 22.5 | 22.5 | identical |
| Mean iters/solve, traj 2 | 26.8 | 22.5 | −16% |
| Mean iters/solve, traj 3 | 28.5 | 22.2 | −22% |
| Mean iters/solve, traj 4 | 29.5 | 22.2 | −25% |
| Mean iters/solve, traj 5 | 29.5 | 22.2 | −25% |
| Maximum iter inflation | 1.46× |
1.07× |
held to baseline |
| Acceptance (paired seed) | 3/5 | 3/5 | identical (same seed → same accept decisions) |
<dH> |
+0.076 |
+0.073 |
tiny improvement |
The baseline iteration count climbs from 22 to a plateau near 30 — that is the gauge wandering away from the operator the prolongator was built for. The tracked configuration is bounded at 1.07× over the same MD evolution. The 27% wall saving is net of the per-trajectory pool-refresh overhead.
ET internal accounting from the tracked run:
Trajectory 1: pool seeded from 24 MG null vectors, +1 absorbed Ritz vector → pool=25
Trajectory 2: +2 absorbed → pool=27
Trajectory 3: +2 absorbed → pool=29
(refresh consumes the first 24 each accept)
The two runs share the same RNG seed for both momentum sampling and Metropolis acceptance, so accept/reject decisions agree trajectory-by-trajectory; <dH> differs only in the third decimal because the better-conditioned solve in the tracked run gives slightly tighter forces. The eigentracker is doing pure preconditioner maintenance, not perturbing the underlying physics.
-
Non-MG settings. ET's primary value is keeping the MG null vectors fresh. Without an MG preconditioner there is no null-vector refresh path; the tracker still accumulates the pool and the
--eigentracking-residual-capknob still works, but the absorbed vectors only feed the next-trajectory chronological initial guess and the pool's RR-evolved basis — useful but a smaller delta than the MG-refresh path. - Static gauge. With no operator drift between solves the persistent pool has nothing to amortise over. Use QUDA's standard EigCG / FGMRES-DR for static-gauge multi-RHS deflation instead.
- Non-Wilson actions. The HMC scaffolding currently provides force kernels for Wilson only. The eigentracker itself is action-agnostic; in non-HMC settings (gauge-only updates, Chroma-driven HMC) it works on whatever Dirac operator the caller installs. Production use of QUDA's HMC for staggered / twisted-mass / domain-wall is out of scope for the present release.
- Multi-shift / RHMC. The current implementation tracks one shift at a time. Multi-shift CG / RHMC integration (a single tracker that captures Ritz information for all shifts in one solve) is a planned follow-up.
-
Staggered KD coarse op. The pool refresh path assumes fine-grid null vectors that can be replaced by pool entries. The Kahler-Dirac transfer used for staggered fermions has no fine null vectors of that form; pool-driven refresh is currently a no-op for staggered KD. The standard KD reset path (
mg->mg->resetStaggeredKD(...)) handles those operators.