Open
Description
Does torchtune
support saving multiple checkpoints per epoch? AFAICT we only save checkpoints at epoch boundaries but please correct me if I'm wrong.
If not, would it be feasible to extend torchtune
to save multiple checkpoints per epoch? One potential approach could be to pass a step
variable to save_checkpoint
here and either save subdirectories (e.g. epoch_i/step_j/
or update the current checkpoint naming to include the step number (e.g. epoch_i_step_j/
).
Activity