Skip to content

SpectraX v0.1.2: safer MPMD stage JITs + faster TensorStore restores

Latest

Choose a tag to compare

@erfanzar erfanzar released this 07 May 20:46
· 8 commits to main since this release

Release Notes

SpectraX 0.1.2 is a stability and checkpointing release on top of the 0.1.x True MPMD runtime line.

Highlights

  • Added private stage-JIT cache handling for MPMD schedule executables.
  • Improved TensorStore checkpoint loading for large model restores.
  • Added index-only checkpoint restore support when structure sidecars are missing.
  • Added caller-provided checkpoint key aliases for framework/layout migrations.
  • Expanded serialization tests and GCS-auth test support.
  • Package lint/type checks are clean.

MPMD Runtime

  • Stage forward, backward, terminal, and scheduler body JITs now compile under a private cache scope.
  • Avoids unsafe global persistent cache reuse for jit_stage_body-* executables across different MPMD plans.
  • Keeps normal in-process executable reuse after first compile.
  • Preserves True MPMD behavior: forward, backward, and schedule execution are still split and dispatched per stage mesh.

Checkpointing

  • Added can_skip_structure restore path for TensorStore checkpoints that have tensorstore_index.json but no {prefix}_structure.json.
  • Added TensorStore load controls:
    • concurrent_gb
    • tensorstore_io_concurrency
    • tensorstore_copy_concurrency
    • tensorstore_cache_gb
    • tensorstore_assume_metadata
    • tensorstore_metadata_workers
    • show_progress
    • progress_every
  • Added progress reporting for large weight loads.
  • Added template-aware key aliasing so downstream frameworks can map legacy checkpoint names without baking those aliases into SpectraX.
  • Improved metadata/index handling for faster hosted checkpoint restores.

Validation

uv run ruff check spectrax
uv run basedpyright spectrax

Both pass on the package code.

Upgrade

pip install -U spectrax==0.1.2

Notes

This release intentionally does not make MPMD stage executables reusable through the global persistent disk cache. Those executables are too dependent on stage mesh, rebased shardings, schedule shape, and split jaxpr state. The safer behavior is private first-compile handling plus normal in-process reuse.

Full Changelog: v0.1.0...v0.1.2