Could you please provide an example about how to resume training?

Hi, I have tried to save the checkpoint and resume training. It seems that the parameters have been loaded, but the result is worse than training from scratch.
Here is the code I modified.

if resume:
  checkpoint = torch.load(checkpoint_path)
  model.load_state_dict(checkpoint['model_state_dict'])
  optimizer.load_state_dict(checkpoint['optimizer_state_dict']) 
  start_epoch = checkpoint['epoch']
  loss = checkpoint['loss']
  model.train()

torch.save({
  'epoch': epoch + 1,
  'model_state_dict': model.module.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'loss': loss,
  }, checkpoint_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could you please provide an example about how to resume training? #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you please provide an example about how to resume training? #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions