Skip to content

Fix trainer.test() logging incorrect epoch from checkpoint (#20052)#21714

Open
devraj2307 wants to merge 3 commits into
Lightning-AI:masterfrom
devraj2307:issue-20052-test-ckpt-state-sync
Open

Fix trainer.test() logging incorrect epoch from checkpoint (#20052)#21714
devraj2307 wants to merge 3 commits into
Lightning-AI:masterfrom
devraj2307:issue-20052-test-ckpt-state-sync

Conversation

@devraj2307
Copy link
Copy Markdown

@devraj2307 devraj2307 commented May 13, 2026

What does this PR do?

Fixes #20052

Restores the checkpoint epoch and global step when running Trainer.validate(), Trainer.test(), or Trainer.predict() with an explicit ckpt_path.

Previously, these entry points restored the model weights from the requested checkpoint but left the fit-loop progress counters unchanged. As a result, hooks and loggers still
observed the last in-memory training epoch/step instead of the values stored in the selected checkpoint.

This PR:

  • restores the fit-loop progress needed to back trainer.current_epoch and trainer.global_step for evaluation/prediction checkpoint restores
  • adds a regression test covering trainer.test(ckpt_path=...) with an older checkpoint
  • updates restore-path expectations to reflect the corrected global_step behavior during evaluation
  • adds a changelog entry

No breaking changes.


📚 Documentation preview 📚: https://pytorch-lightning--21714.org.readthedocs.build/en/21714/

@github-actions github-actions Bot added the pl Generic label for PyTorch Lightning package label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

trainer.test() with given checkpoint logs last epoch instead of checkpoint epoch

1 participant