-
Notifications
You must be signed in to change notification settings - Fork 113
HMC Utilities
QUDA includes an internal Hybrid Monte Carlo (HMC) layer that exercises the GPU-resident gauge / fermion / momentum infrastructure end-to-end. It is intended as a development and validation tool — for stress-testing solver acceleration features (notably MG with Eigenspace Tracking), for regression-testing integrator and force kernels, and for generating reproducible test gauges. It is not a production HMC implementation. Production HMC runs should continue to use Chroma, tmLQCD, MILC, or another mature lattice code that drives QUDA via its inverter / force C-API.
The HMC layer ships with verified force kernels for Wilson fermions only (the Wilson EO fermion force in
lib/hmc.cpp::computeEOFermionForce). Production HMC requires substantially more action support and machinery than is in this PR. In particular, the following are not available in QUDA's internal HMC at present:
- Clover-Wilson HMC. Clover sigma-force kernels were developed but did not pass the per-link numerical-derivative test; they have been quarantined to a separate directory pending repair. The clover operator still works in the inverter (you can run inversions and Dirac applications with
--dslash-type clovereverywhere), but the HMC force is not trustworthy and HMC runs with clover are not supported.- Twisted-mass and twisted-clover HMC.
- Staggered, HISQ, asqtad HMC.
- Domain-wall and Möbius HMC.
- Multi-flavour / RHMC / rational forces (
n_f ≠ 2).- Mass preconditioning, multiple pseudofermions per flavour.
- Stout / HEX / link smearing inside the action.
These are out of scope for the current contribution. Do not use this layer to generate physics-quality ensembles. Its purpose is to provide a self-contained, reproducible HMC harness against which solver-acceleration changes can be measured (and against which the solver-side C-API can be validated, e.g., that MG re-setup correctly handles operator updates between trajectories).
- What is provided
- Test-suite coverage
- CLI parameters
- Example invocations
- Saved gauges and checkpoints
- What this is not
Polymorphic Integrator hierarchy (include/hmc_integrator.h, lib/hmc_integrator.cpp) with a common factory + getOrCreateIntegrator cache that survives across trajectories so per-integrator persistent state (e.g. nested-FGI's coarse deflation manager) is not torn down and rebuilt every trajectory.
| Integrator | Description | Order | CLI |
|---|---|---|---|
LeapfrogIntegrator |
Standard 2nd-order PQP leapfrog. | 2 | --hmc-integrator 0 |
OmelyanIntegrator |
2nd-order minimum-norm Omelyan, λ = 0.1931833...
|
2 | --hmc-integrator 1 |
FGIntegrator |
4th-order Hessian-free force-gradient (Yin–Mawhinney / Schäfers; PQPQP_FG, λ=1/6, ξ=1/72). |
4 | --hmc-integrator 2 |
NestedFGIIntegrator |
4th-order force-gradient outer + leapfrog/Omelyan inner. Uses a LowModeForce inner approximation built from a coarse-grid deflation manager, allowing the inner timescale to skip the expensive fine-grid solve. Required when MG is enabled. |
4 | --hmc-integrator 3 |
The Hessian-free force-gradient term follows Schäfers et al. (arXiv:2501.17632 Eq. 8) with the appropriate convention mapping for QUDA's force sign convention. The implementation invalidates and rebuilds the extended-gauge halo around the gauge-displacement step inside the FG sub-step (a missing rebuild was a real bug encountered during development).
| Function | File | Purpose |
|---|---|---|
computeEOFermionForce |
lib/hmc.cpp |
Wilson even-odd fermion force; mom += dt × ∂_U log det(D†D). The only verified fermion-force kernel in this layer. |
computeFermionAction |
lib/hmc.cpp |
Pseudofermion action S_f = φ† (M†M)⁻¹ φ via the MG γ₅ two-pass trick when MG is enabled, or a direct CG on M†M otherwise. |
generateEOPseudofermion |
lib/hmc.cpp |
Sample φ = M† η from Gaussian noise η. |
| Standard QUDA gauge force |
lib/gauge_force_quda.cu (existing) |
Pure-gauge Wilson plaquette force, called from Integrator::kick. |
The C-API is hmcTrajectoryQuda (one trajectory) and hmcRunQuda (multi-trajectory loop with Metropolis accept/reject, optional thermalisation, optional checkpointing).
The HMC loop is the natural test bed for MG-preconditioner staleness. Three update modes are wired into hmcRunQuda:
| Mode | Trigger | Cost | Description |
|---|---|---|---|
| Thin update | every accepted trajectory | cheapest | Coarse Dirac rebuilt from new gauge; null vectors and prolongator preserved. |
| Standard refresh | iter-ratio trip OR --hmc-mg-setup-interval N
|
medium | Existing null vectors polished with --mg-setup-maxiter-refresh N CG iters; prolongator and coarse Dirac rebuilt. |
| Pool refresh (with Eigenspace Tracking) | as standard refresh, plus --eigentracking-mg-refresh-iters N>=0
|
medium-cheap | Existing null vectors replaced from the eigentracker pool, polished N CG iters per vector; prolongator and coarse Dirac rebuilt. |
| Full re-setup |
--hmc-mg-setup-interval N with thin_update_only=FALSE and ET disabled |
most expensive | Null vectors regenerated from random with --mg-setup-maxiter CG iters. |
The 2.0× iter-ratio adaptive trigger is QUDA's default safety net: if per-solve iters exceed 2.0× the early-trajectory baseline, the configured refresh path fires.
See the dedicated Eigenspace Tracking page. The HMC layer is the primary integration site — it is where the tracker's "operator changes between solves" assumption is non-trivially exercised.
tests/hmc_test.cpp registers the following gtest cases. They all run on small lattices (default 4^4) and complete in seconds; they form the regression net for HMC-side changes.
| Test | What it verifies |
|---|---|
HMC.ParameterDefaults |
QudaHMCParam defaults round-trip through newQudaHMCParam / checkHMCParam. |
HMC.LeapfrogTrajectory / HMC.OmelyanTrajectory / HMC.ForceGradientTrajectory / HMC.NestedFGIParameterSetup
|
Each integrator runs a single trajectory without crashing and produces a finite dH. |
HMC.MultiTrajectoryRun |
hmcRunQuda with n_traj > 1 correctly resets per-trajectory state. |
HMC.MGPreconditionedRun |
MG-preconditioned trajectory works end-to-end (single trajectory, smoke-test of all three MG-update modes). |
HMC.GaugeForceActionConsistency |
S(U + dt·F) - S(U) = dt × dS/dU + O(dt²); cross-checks the gauge force against finite-difference of the gauge action. |
HMC.DirectionalForceTest / HMC.PerLinkForceTest
|
Per-link numerical derivative of the action vs analytical force. The single most important regression test for any new force kernel. |
HMC.dHStatistics / HMC.dHScaling / HMC.dHScalingNestedFGI
|
Multi-trajectory <dH> distributions and dH ∝ τ·dt^p scaling for the integrator order p. Catches integrator scaling regressions. |
HMC.ReversibilityTest / HMC.ReversibilityAllIntegrators
|
Bit-for-bit reversibility for each integrator (U(τ) then U(-τ) returns to the starting gauge). |
HMC.Production |
Long-running production-style trajectory; configurable via CLI for in-context testing on real lattices. The test harness used for the Eigenspace Tracking results. |
The HMC layer adds the following groups to tests/hmc_test. --inv-multigrid true (and the standard MG parameter group) is required to exercise the MG path.
| Flag | Description | Default |
|---|---|---|
--hmc-integrator <0/1/2/3> |
Leapfrog / Omelyan / FG / Nested-FGI. |
2 (FG) |
--hmc-tau <τ> |
Trajectory length. | 1.0 |
--hmc-n-steps <N> |
Outer MD steps per trajectory. | 20 |
--hmc-beta <β> |
Pure-gauge coupling. | 5.6 |
--hmc-n-trajectories <N> |
Total trajectories in hmcRunQuda. |
1 |
--hmc-thermalization <N> |
Auto-accept (no Metropolis) for the first N trajectories. |
0 |
--hmc-momentum-seed <seed> |
RNG seed for momentum sampling and Metropolis draws. Same seed → reproducible accept/reject sequence (required for paired comparisons of solver flags). | 0 |
--fermion-t-boundary <periodic/anti-periodic> |
Fermion temporal BC. | anti-periodic |
| Flag | Description |
|---|---|
--hmc-gauge-infile <path> |
Load starting gauge from a LIME-format file (e.g., a Chroma cfg). |
--hmc-gauge-outfile <path> |
Write final gauge to LIME at the end of the run. |
--hmc-checkpoint <N> |
Write a gauge checkpoint every N accepted trajectories. |
--hmc-checkpoint-prefix <str> |
Prefix for checkpoint filenames. |
| Flag | Description | Default |
|---|---|---|
--hmc-n-inner-steps <N> |
Inner sub-trajectory steps per outer FG step (cheap-force scale). | 5 |
--hmc-inner-integrator <0/1> |
Inner integrator: leapfrog (0) or Omelyan (1). | 1 |
--hmc-n-defl <N> |
Coarse deflation eigenpair count for the LowModeForce inner approximation. | 8 |
--hmc-defl-refresh-interval <N> |
Inner steps between coarse-deflation Rayleigh-Ritz refreshes (0 = frozen). | 0 |
| Flag | Description | Default |
|---|---|---|
--hmc-mg-setup-interval <N> |
Force MG refresh every N accepted trajectories (0 = no periodic refresh). |
0 |
--hmc-mg-setup-iter-ratio <r> |
Adaptive trigger: refresh when iters/solve exceeds r× baseline. |
2.0 |
--hmc-mg-setup-iter-baseline-traj <N> |
Trajectories used to establish baseline. | 5 |
See Eigenspace Tracking for the full set of --eigentracking-* flags.
./tests/hmc_test --gtest_filter="HMC.ReversibilityAllIntegrators" \
--dim 4 4 4 4 \
--dslash-type wilson --mass -0.5 \
--prec double --prec-sloppy double --prec-precondition double \
--verbosity verbose
Runs reversibility checks on all four integrators in a few seconds. Use this as the smoke test after any HMC code change.
The most stringent regression for force-kernel work. Compares numerical (S(U + ε·δU) - S(U)) / ε against the analytical force, link by link.
./tests/hmc_test --gtest_filter="HMC.PerLinkForceTest" \
--dim 4 4 4 4 \
--dslash-type wilson \
--prec double --prec-sloppy double --prec-precondition double \
--verbosity verbose
24³×48 Wilson at m = -0.835 (κ = 0.158), nested FGI with 80 outer steps at τ = 1, MG with eigentracker pool refresh every accepted trajectory:
./tests/hmc_test --gtest_filter="HMC.Production" \
--dim 24 24 24 48 \
--prec double --prec-sloppy single --prec-precondition single \
--prec-null single --prec-eigensolver single --prec-refine single \
--dslash-type wilson --mass -0.835 \
--inv-type gcr --solution-type mat-pc --solve-type direct-pc --matpc even-even \
--niter 5000 --tol 1e-9 \
--hmc-integrator 3 --hmc-tau 1.0 --hmc-n-steps 80 --hmc-beta 5.6 \
--inv-multigrid true --mg-levels 2 --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
--mg-setup-inv 0 cgnr --mg-setup-iters 0 2 \
--mg-setup-tol 0 1e-6 --mg-setup-maxiter 0 2000 --mg-setup-maxiter-refresh 0 100 \
--eigentracking true --eigentracking-mg-refresh-iters 2 \
--hmc-mg-setup-interval 1 \
--hmc-mg-setup-iter-baseline-traj 5 \
--fermion-t-boundary anti-periodic \
--hmc-gauge-infile path/to/start.lime \
--hmc-gauge-outfile path/to/end.lime \
--hmc-checkpoint 5 --hmc-checkpoint-prefix path/to/ckpt_ \
--hmc-n-trajectories 20 --hmc-momentum-seed 67890
This is the configuration used to produce the Eigenspace Tracking results. The matching baseline (no ET) drops the four --eigentracking-* lines and --hmc-mg-setup-interval 1.
LIME-format I/O is supported via the existing QUDA helpers. Gauge files written by this layer can be read by Chroma, tmLQCD, etc., and vice versa. Checkpointing writes the gauge after every N-th accepted trajectory, named <prefix>traj_<n>.lime.
To restate the upfront warning, this section is for users surveying QUDA's gauge utilities and wondering whether to use this for production HMC:
- Not a Wilson-clover HMC — clover force kernels are quarantined pending repair. Production clover HMC: use Chroma + QUDA-MG.
- Not a staggered / HISQ / domain-wall HMC — no force kernels for these actions are wired into this layer. Production: use MILC + QUDA, CPS + QUDA.
-
Not a multi-flavour HMC — single-pseudofermion
n_f = 2Wilson only. No rational approximation, no mass preconditioning, no multi-time-scale separation beyond the nested-FGI inner / outer split. - Not a HMC with smeared links in the action — pure-gauge force is Wilson plaquette only.
If you need any of the above, drive QUDA from a production HMC code (Chroma, tmLQCD, MILC, ...) via the standard inverter / force C-API; that is the supported path. The HMC utilities documented here are for solver-side development and validation work.