Skip to content

Design discussion: Simulation checkpoint & restart support for TORAX #1894

@CodersAcademy006

Description

@CodersAcademy006

Hi maintainers,

Thank you for releasing TORAX. It is a very clean and well-structured
differentiable transport code, and I have been enjoying reading through it.

I wanted to start a design discussion around adding checkpoint and restart
support for simulations, and to first check whether this aligns with the
project’s direction.

Motivation:

TORAX is well suited for long running simulations, parameter sweeps, and
gradient-based optimization loops. In these settings, the ability to save
intermediate simulation state and later resume execution is often essential.
This would help with preempted jobs, large parameter studies, and outer-loop
optimization or control workflows.

Proposed scope:

The idea would be to add a simple checkpoint and restart interface that can
save the full simulation state including the physical state, time, and solver
context, and restore a simulation from a saved checkpoint to continue running.
This would be infrastructure only, without changing the underlying physics
or solver behavior.

Storage format:

Since TORAX is a fusion transport code, standard scientific formats matter.
A reasonable default could be HDF5 (via h5py) or NetCDF, as these are commonly
used in the fusion community. I am also open to native JAX serialization
approaches if that is preferred for tighter integration with JAX workflows.

Backward compatibility:

I would expect checkpoint formats to evolve over time, so the design would
include a simple versioning scheme to support future compatibility.

Before going further, I wanted to ask:

  1. Is checkpoint and restart functionality aligned with TORAX’s intended scope?
  2. Are there preferred storage formats or existing conventions I should follow?
  3. Are there architectural considerations I should be aware of?

If this makes sense, I would be happy to help with a small design RFC or a
prototype implementation for discussion.

Thanks for your time, and apologies if I missed any existing related work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions