Skip to content

Saving multiple checkpoints per epoch #2285

Open
@EugenHotaj

Description

@EugenHotaj

Does torchtune support saving multiple checkpoints per epoch? AFAICT we only save checkpoints at epoch boundaries but please correct me if I'm wrong.

If not, would it be feasible to extend torchtune to save multiple checkpoints per epoch? One potential approach could be to pass a step variable to save_checkpoint here and either save subdirectories (e.g. epoch_i/step_j/ or update the current checkpoint naming to include the step number (e.g. epoch_i_step_j/).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesttriagedThis issue has been assigned an owner and appropriate label

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions