HMC Utilities

QUDA includes an internal Hybrid Monte Carlo (HMC) layer that exercises the GPU-resident gauge / fermion / momentum infrastructure end-to-end. It is intended as a development and validation tool — for stress-testing solver acceleration features (notably MG with Eigenspace Tracking), for regression-testing integrator and force kernels, and for generating reproducible test gauges. It is not a production HMC implementation. Production HMC runs should continue to use Chroma, tmLQCD, MILC, or another mature lattice code that drives QUDA via its inverter / force C-API.

⚠ Status: development tool only

The HMC layer ships with verified force kernels for Wilson fermions only (the Wilson EO fermion force in lib/hmc.cpp::computeEOFermionForce). Production HMC requires substantially more action support and machinery than is in this PR. In particular, the following are not available in QUDA's internal HMC at present:

Clover-Wilson HMC. Clover sigma-force kernels were developed but did not pass the per-link numerical-derivative test; they have been quarantined to a separate directory pending repair. The clover operator still works in the inverter (you can run inversions and Dirac applications with --dslash-type clover everywhere), but the HMC force is not trustworthy and HMC runs with clover are not supported.

Twisted-mass and twisted-clover HMC.

Staggered, HISQ, asqtad HMC.

Domain-wall and Möbius HMC.

Multi-flavour / RHMC / rational forces (n_f ≠ 2).

Mass preconditioning, multiple pseudofermions per flavour.

Stout / HEX / link smearing inside the action.

These are out of scope for the current contribution. Do not use this layer to generate physics-quality ensembles. Its purpose is to provide a self-contained, reproducible HMC harness against which solver-acceleration changes can be measured (and against which the solver-side C-API can be validated, e.g., that MG re-setup correctly handles operator updates between trajectories).

What is provided

Integrators

Polymorphic Integrator hierarchy (include/hmc_integrator.h, lib/hmc_integrator.cpp) with a common factory + getOrCreateIntegrator cache that survives across trajectories so per-integrator persistent state (e.g. nested-FGI's coarse deflation manager) is not torn down and rebuilt every trajectory.

Integrator	Description	Order	CLI
`LeapfrogIntegrator`	Standard 2nd-order PQP leapfrog.	2	`--hmc-integrator 0`
`OmelyanIntegrator`	2nd-order minimum-norm Omelyan, `λ = 0.1931833...`	2	`--hmc-integrator 1`
`FGIntegrator`	4th-order Hessian-free force-gradient (Yin–Mawhinney / Schäfers; PQPQP_FG, `λ=1/6`, `ξ=1/72`).	4	`--hmc-integrator 2`
`NestedFGIIntegrator`	4th-order force-gradient outer + leapfrog/Omelyan inner. Uses a `LowModeForce` inner approximation built from a coarse-grid deflation manager, allowing the inner timescale to skip the expensive fine-grid solve. Required when MG is enabled.	4	`--hmc-integrator 3`

The Hessian-free force-gradient term follows Schäfers et al. (arXiv:2501.17632 Eq. 8) with the appropriate convention mapping for QUDA's force sign convention. The implementation invalidates and rebuilds the extended-gauge halo around the gauge-displacement step inside the FG sub-step (a missing rebuild was a real bug encountered during development).

Forces and actions (Wilson EO)

Function	File	Purpose
`computeEOFermionForce`	`lib/hmc.cpp`	Wilson even-odd fermion force; `mom += dt × ∂_U log det(D†D)`. The only verified fermion-force kernel in this layer.
`computeFermionAction`	`lib/hmc.cpp`	Pseudofermion action `S_f = φ† (M†M)⁻¹ φ` via the MG γ₅ two-pass trick when MG is enabled, or a direct CG on `M†M` otherwise.
`generateEOPseudofermion`	`lib/hmc.cpp`	Sample `φ = M† η` from Gaussian noise `η`.
Standard QUDA gauge force	`lib/gauge_force_quda.cu` (existing)	Pure-gauge Wilson plaquette force, called from `Integrator::kick`.

The C-API is hmcTrajectoryQuda (one trajectory) and hmcRunQuda (multi-trajectory loop with Metropolis accept/reject, optional thermalisation, optional checkpointing).

MG re-setup integration

The HMC loop is the natural test bed for MG-preconditioner staleness. Three update modes are wired into hmcRunQuda:

Mode	Trigger	Cost	Description
Thin update	every accepted trajectory	cheapest	Coarse Dirac rebuilt from new gauge; null vectors and prolongator preserved.
Standard refresh	iter-ratio trip OR `--hmc-mg-setup-interval N`	medium	Existing null vectors polished with `--mg-setup-maxiter-refresh N` CG iters; prolongator and coarse Dirac rebuilt.
Pool refresh (with Eigenspace Tracking)	as standard refresh, plus `--eigentracking-mg-refresh-iters N>=0`	medium-cheap	Existing null vectors replaced from the eigentracker pool, polished `N` CG iters per vector; prolongator and coarse Dirac rebuilt.
Full re-setup	`--hmc-mg-setup-interval N` with `thin_update_only=FALSE` and ET disabled	most expensive	Null vectors regenerated from random with `--mg-setup-maxiter` CG iters.

The 2.0× iter-ratio adaptive trigger is QUDA's default safety net: if per-solve iters exceed 2.0× the early-trajectory baseline, the configured refresh path fires.

Eigenspace tracking

See the dedicated Eigenspace Tracking page. The HMC layer is the primary integration site — it is where the tracker's "operator changes between solves" assumption is non-trivially exercised.

Test-suite coverage

tests/hmc_test.cpp registers the following gtest cases. They all run on small lattices (default 4^4) and complete in seconds; they form the regression net for HMC-side changes.

Test	What it verifies
`HMC.ParameterDefaults`	`QudaHMCParam` defaults round-trip through `newQudaHMCParam` / `checkHMCParam`.
`HMC.LeapfrogTrajectory` / `HMC.OmelyanTrajectory` / `HMC.ForceGradientTrajectory` / `HMC.NestedFGIParameterSetup`	Each integrator runs a single trajectory without crashing and produces a finite `dH`.
`HMC.MultiTrajectoryRun`	`hmcRunQuda` with `n_traj > 1` correctly resets per-trajectory state.
`HMC.MGPreconditionedRun`	MG-preconditioned trajectory works end-to-end (single trajectory, smoke-test of all three MG-update modes).
`HMC.GaugeForceActionConsistency`	`S(U + dt·F) - S(U) = dt × dS/dU + O(dt²)`; cross-checks the gauge force against finite-difference of the gauge action.
`HMC.DirectionalForceTest` / `HMC.PerLinkForceTest`	Per-link numerical derivative of the action vs analytical force. The single most important regression test for any new force kernel.
`HMC.dHStatistics` / `HMC.dHScaling` / `HMC.dHScalingNestedFGI`	Multi-trajectory `<dH>` distributions and `dH ∝ τ·dt^p` scaling for the integrator order `p`. Catches integrator scaling regressions.
`HMC.ReversibilityTest` / `HMC.ReversibilityAllIntegrators`	Bit-for-bit reversibility for each integrator (`U(τ)` then `U(-τ)` returns to the starting gauge).
`HMC.Production`	Long-running production-style trajectory; configurable via CLI for in-context testing on real lattices. The test harness used for the Eigenspace Tracking results.

CLI parameters

The HMC layer adds the following groups to tests/hmc_test. --inv-multigrid true (and the standard MG parameter group) is required to exercise the MG path.

Trajectory configuration

Flag	Description	Default
`--hmc-integrator <0/1/2/3>`	Leapfrog / Omelyan / FG / Nested-FGI.	`2` (FG)
`--hmc-tau <τ>`	Trajectory length.	`1.0`
`--hmc-n-steps <N>`	Outer MD steps per trajectory.	`20`
`--hmc-beta <β>`	Pure-gauge coupling.	`5.6`
`--hmc-n-trajectories <N>`	Total trajectories in `hmcRunQuda`.	`1`
`--hmc-thermalization <N>`	Auto-accept (no Metropolis) for the first `N` trajectories.	`0`
`--hmc-momentum-seed <seed>`	RNG seed for momentum sampling and Metropolis draws. Same seed → reproducible accept/reject sequence (required for paired comparisons of solver flags).	`0`
`--fermion-t-boundary <periodic/anti-periodic>`	Fermion temporal BC.	anti-periodic

Gauge I/O

Flag	Description
`--hmc-gauge-infile <path>`	Load starting gauge from a LIME-format file (e.g., a Chroma `cfg`).
`--hmc-gauge-outfile <path>`	Write final gauge to LIME at the end of the run.
`--hmc-checkpoint <N>`	Write a gauge checkpoint every `N` accepted trajectories.
`--hmc-checkpoint-prefix <str>`	Prefix for checkpoint filenames.

Nested FGI inner timescale

Flag	Description	Default
`--hmc-n-inner-steps <N>`	Inner sub-trajectory steps per outer FG step (cheap-force scale).	`5`
`--hmc-inner-integrator <0/1>`	Inner integrator: leapfrog (0) or Omelyan (1).	`1`
`--hmc-n-defl <N>`	Coarse deflation eigenpair count for the LowModeForce inner approximation.	`8`
`--hmc-defl-refresh-interval <N>`	Inner steps between coarse-deflation Rayleigh-Ritz refreshes (0 = frozen).	`0`

MG re-setup interaction

Flag	Description	Default
`--hmc-mg-setup-interval <N>`	Force MG refresh every `N` accepted trajectories (0 = no periodic refresh).	`0`
`--hmc-mg-setup-iter-ratio <r>`	Adaptive trigger: refresh when iters/solve exceeds `r×` baseline.	`2.0`
`--hmc-mg-setup-iter-baseline-traj <N>`	Trajectories used to establish baseline.	`5`

See Eigenspace Tracking for the full set of --eigentracking-* flags.

Example invocations

Fast regression test (4^4 Wilson, all integrators)

./tests/hmc_test --gtest_filter="HMC.ReversibilityAllIntegrators" \
    --dim 4 4 4 4 \
    --dslash-type wilson --mass -0.5 \
    --prec double --prec-sloppy double --prec-precondition double \
    --verbosity verbose

Runs reversibility checks on all four integrators in a few seconds. Use this as the smoke test after any HMC code change.

Per-link force test

The most stringent regression for force-kernel work. Compares numerical (S(U + ε·δU) - S(U)) / ε against the analytical force, link by link.

./tests/hmc_test --gtest_filter="HMC.PerLinkForceTest" \
    --dim 4 4 4 4 \
    --dslash-type wilson \
    --prec double --prec-sloppy double --prec-precondition double \
    --verbosity verbose

Production-style trajectory with MG + Eigenspace Tracking

24³×48 Wilson at m = -0.835 (κ = 0.158), nested FGI with 80 outer steps at τ = 1, MG with eigentracker pool refresh every accepted trajectory:

./tests/hmc_test --gtest_filter="HMC.Production" \
    --dim 24 24 24 48 \
    --prec double --prec-sloppy single --prec-precondition single \
    --prec-null single --prec-eigensolver single --prec-refine single \
    --dslash-type wilson --mass -0.835 \
    --inv-type gcr --solution-type mat-pc --solve-type direct-pc --matpc even-even \
    --niter 5000 --tol 1e-9 \
    --hmc-integrator 3 --hmc-tau 1.0 --hmc-n-steps 80 --hmc-beta 5.6 \
    --inv-multigrid true --mg-levels 2 --mg-block-size 0 4 4 4 4 --mg-nvec 0 24 \
    --mg-setup-inv 0 cgnr --mg-setup-iters 0 2 \
    --mg-setup-tol 0 1e-6 --mg-setup-maxiter 0 2000 --mg-setup-maxiter-refresh 0 100 \
    --eigentracking true --eigentracking-mg-refresh-iters 2 \
    --hmc-mg-setup-interval 1 \
    --hmc-mg-setup-iter-baseline-traj 5 \
    --fermion-t-boundary anti-periodic \
    --hmc-gauge-infile path/to/start.lime \
    --hmc-gauge-outfile path/to/end.lime \
    --hmc-checkpoint 5 --hmc-checkpoint-prefix path/to/ckpt_ \
    --hmc-n-trajectories 20 --hmc-momentum-seed 67890

This is the configuration used to produce the Eigenspace Tracking results. The matching baseline (no ET) drops the four --eigentracking-* lines and --hmc-mg-setup-interval 1.

Saved gauges and checkpoints

LIME-format I/O is supported via the existing QUDA helpers. Gauge files written by this layer can be read by Chroma, tmLQCD, etc., and vice versa. Checkpointing writes the gauge after every N-th accepted trajectory, named <prefix>traj_<n>.lime.

What this is not

To restate the upfront warning, this section is for users surveying QUDA's gauge utilities and wondering whether to use this for production HMC:

Not a Wilson-clover HMC — clover force kernels are quarantined pending repair. Production clover HMC: use Chroma + QUDA-MG.
Not a staggered / HISQ / domain-wall HMC — no force kernels for these actions are wired into this layer. Production: use MILC + QUDA, CPS + QUDA.
Not a multi-flavour HMC — single-pseudofermion n_f = 2 Wilson only. No rational approximation, no mass preconditioning, no multi-time-scale separation beyond the nested-FGI inner / outer split.
Not a HMC with smeared links in the action — pure-gauge force is Wilson plaquette only.

If you need any of the above, drive QUDA from a production HMC code (Chroma, tmLQCD, MILC, ...) via the standard inverter / force C-API; that is the supported path. The HMC utilities documented here are for solver-side development and validation work.

QUDA calls

HMC Utilities

⚠ Status: development tool only

Contents

What is provided

Integrators

Forces and actions (Wilson EO)

MG re-setup integration

Eigenspace tracking

Test-suite coverage

CLI parameters

Trajectory configuration

Gauge I/O

Nested FGI inner timescale

MG re-setup interaction

Example invocations

Fast regression test (4^4 Wilson, all integrators)

Per-link force test

Production-style trajectory with MG + Eigenspace Tracking

Saved gauges and checkpoints

What this is not

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!