Skip to content

Latest commit

 

History

History
235 lines (189 loc) · 11.5 KB

File metadata and controls

235 lines (189 loc) · 11.5 KB

gmat-sweep

CI Docs PyPI Python versions License: MIT

Run parameter sweeps and Monte Carlo dispersions over GMAT missions in parallel from Python.

What this is

A parallel orchestrator on top of gmat-run's single-run primitive. Point gmat-sweep at a working .script and either a parameter grid, an explicit run table, or a perturbation distribution, and it fans the run set across subprocess workers, aggregates each run's ReportFile (and any EphemerisFile or ContactLocator outputs) into multi-indexed pandas DataFrames, and writes a JSON Lines manifest alongside the results so any sweep is reproducible bit-for-bit. Killed sweeps reload from the manifest and re-run only the missing or failed runs.

The four entry points cover the common shapes:

What this is not

  • Not a single-run runner — that's gmat-run; every gmat-sweep worker calls into it.
  • Not a way to build GMAT missions from scratch in Python — see gmatpyplus.
  • Not a .script text generator — see pygmat.
  • Not an optimiser. Gradient-, Bayesian-, and population-based optimisation (CasADi, pagmo2, scikit-optimize) is a different problem; gmat-sweep may serve as the parallel evaluator inside one, but it ships no optimiser of its own.
  • Not a workflow engine. gmat-sweep runs homogeneous parametric sweeps of one mission; Snakemake / Nextflow / Hamilton manage DAGs of heterogeneous tasks. A workflow engine can schedule a gmat-sweep step; the converse is not interesting.

Requirements

  • Python 3.10, 3.11, or 3.12.
  • gmat-run ≥ 0.3 — installed as a transitive dependency from PyPI. gmat-sweep never imports gmatpy directly; the import happens inside each worker subprocess on first call.
  • A local GMAT install. gmat-sweep does not ship GMAT binaries; it relies on gmat-run's install discovery, which honours $GMAT_ROOT or finds a build under a conventional path. Download GMAT from the SourceForge release page — see gmat-run's install guide for the unpack-and-discover steps.

Supported GMAT versions

GMAT release Status CI
R2026a Primary development target Exercised on every PR (Ubuntu + Windows + macOS, Python 3.10/3.11/3.12)
R2025a Supported Exercised on every PR (Ubuntu + Windows + macOS, Python 3.10/3.11/3.12)

R2023a and R2024a were never released by the upstream GMAT project; R2025a and R2026a are the only releases supported.

Installation

pip install gmat-sweep

The [examples] extra pulls in matplotlib for the example notebooks:

pip install gmat-sweep[examples]

Quick start

from gmat_sweep import LocalJoblibPool, sweep

df = sweep(
    "mission.script",
    grid={"Sat.SMA": [7000, 7100, 7200]},
    backend=LocalJoblibPool(max_workers=8),
)
print(df)

That call runs mission.script three times — once per Sat.SMA value — each in a fresh subprocess, and returns a (run_id, time)-MultiIndexed pandas.DataFrame containing the rows from every run's ReportFile plus a __status column flagging ok / failed / skipped. A single failed run lands as a failed row with the captured GMAT stderr in the manifest — never as a silent zero-row DataFrame and never as an unhandled exception that aborts the whole sweep.

For a stochastic dispersion, swap sweep for monte_carlo and pass a perturb mapping of named distributions:

from gmat_sweep import LocalJoblibPool, monte_carlo

df = monte_carlo(
    "mission.script",
    n=1000,
    perturb={"Sat.SMA": ("normal", 7100.0, 50.0)},
    backend=LocalJoblibPool(max_workers=8),
    seed=42,
)

Returns the same DataFrame shape as sweep(). Per-run sub-seeds derive from seed via numpy.random.SeedSequence.spawn, so the draw is bit-reproducible and a resumed sweep samples the same values for any given run_id. See the Monte Carlo guide for the full determinism contract and latin_hypercube for the stratified-sampling variant.

By default the per-run Parquet files and the manifest land in a temporary directory whose lifetime is tied to the returned DataFrame. Pass out=Path(...) to keep them — that's also what enables resuming a killed sweep via Sweep.from_manifest(...).resume() or gmat-sweep resume <manifest>.

For multi-host sweeps, swap the local pool for DaskPool or RayPool — same sweep() / monte_carlo() / latin_hypercube() call shape, different backend=:

from gmat_sweep import sweep
from gmat_sweep.backends import DaskPool

with DaskPool(n_workers=8) as pool:
    df = sweep(
        "mission.script",
        grid={"Sat.SMA": [7000, 7100, 7200]},
        backend=pool,
    )

DaskPool and RayPool ship behind pip install gmat-sweep[dask] / gmat-sweep[ray]. See the backends page for the full set of pool patterns and the cluster recipes for Slurm / Kubernetes / Ray autoscaling wiring.

A gmat-sweep console script is also installed for shell-script and CI use:

gmat-sweep run         --grid Sat.SMA=7000:7200:3 --workers 8 --out ./sweep mission.script
gmat-sweep run         --grid Sat.SMA=7000:7200:3 --backend dask --workers 8 --out ./sweep mission.script
gmat-sweep monte-carlo --n 1000 --perturb 'Sat.SMA=normal:7100:50' --seed 42 --out ./mc mission.script
gmat-sweep resume      --script mission.script --workers 8 ./mc/manifest.jsonl
gmat-sweep show        ./sweep/manifest.jsonl
gmat-sweep archive     --out ./sweep.zip ./sweep/manifest.jsonl

See the CLI reference in the docs for every subcommand and the full mini-grammar.

Outputs

Every sweep emits two artefacts:

  • The returned DataFrame(run_id, time)-MultiIndexed, one column per ReportFile channel plus the __status column. Built lazily from per-run Parquet files via pyarrow's dataset API, so a 10,000-run sweep does not have to fit in memory at once.
  • A JSON Lines manifest (manifest.jsonl) — append-only, fsync'd after every entry. Records the canonical script SHA-256, software-version fingerprint, full parameter spec, and per-run status, timing, output paths, and captured stderr. A Ctrl-C mid-sweep leaves the manifest in a parseable state. See the manifest schema for the full contract.

Documentation

Full docs at https://astro-tools.github.io/gmat-sweep/, including a getting-started guide, the parameter spec reference, the manifest schema, the supported-version matrix, the FAQ, and the API reference.

Runnable example notebooks:

  • Single-axis SMA scan — fifty runs across np.linspace(7000, 8000, 50) of Sat.SMA, parallel-dispatched and overlaid on a single altitude-vs-time plot.
  • Two-axis epoch × time-of-flight grid — cartesian product over Sat.Epoch and a script-level Variable TOF, contoured by per-run miss distance.
  • Surviving a kill — launch a sweep, send SIGINT mid-run, walk through inspecting the partial manifest with gmat-sweep show, then complete the sweep with Sweep.from_manifest(...).resume().
  • Monte Carlo dispersion — 1000-run Monte Carlo around a nominal injection burn over a four-axis perturbation cube, with arrival-miss histogram and a 3-σ covariance ellipse.
  • Latin hypercube vs Monte Carlo — 64-run Latin hypercube alongside a 64-run plain Monte Carlo on the same perturbation, pair-plotting the unit-cube samples to make the stratification visible.
  • Dask cluster recipe — 100-run Sat.SMA grid dispatched through a distributed.LocalCluster with DaskPool, same flow as a real dask.distributed cluster.
  • Ray autoscaling recipe — 100-run Monte Carlo dispatched through RayPool against a local ray.init(), same task model as a real autoscaling Ray cluster.
  • Sobol sensitivity — Saltelli design via sobol_sample, run through sweep(samples=...), reduced to first/total-order Sobol indices via sobol_analyze with 95 % bootstrap CIs.
  • Archive bundle — pack a finished sweep into a self-describing .zip via Sweep.archive(), inspect the layout, and re-aggregate the per-run DataFrame from the unzipped tree.
  • Extending a Monte Carlo — anchor a 100-run monte_carlo, append 200 more via monte_carlo_extend(n=200), and assert that the original 100 run_ids are preserved bit-for-bit.

Development

To work on gmat-sweep itself:

git clone https://github.com/astro-tools/gmat-sweep.git
cd gmat-sweep
uv sync --all-groups

See CONTRIBUTING.md for the full branch / PR / test workflow.

Licence

MIT. See LICENSE.