-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Background
Currently, if an experiment run is interrupted (e.g., due to manual cancellation, a node going down, or other unexpected issues), it is not possible to resume the experiment without restarting everything from scratch.
This wastes time and compute, especially for large-scale experiments.
Proposal
Implement a checkpointing / resume system that allows VERONA to continue experiments from where they left off.
Key ideas:
- Track which experiment instances have already been completed (e.g., store this in a results CSV or a lightweight database).
- On restarting, check which instances are missing or incomplete.
- Continue execution only for the incomplete instances.
Benefits
- Saves time by avoiding re-running completed instances.
- Makes VERONA more robust to interruptions and cluster instability.
- Improves user experience for long-running experiments.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request