Question on Training Epoch

Hi Alasdair,
Thank you for your great work. I have two questions. Hope you can help me out.

1. How many epochs of training are required for obtaining the results shown in the paper?
I have tried to re-train the "5_transformer_roberta" variant from scratch and I get a CIDEr score of  35.6 after training for 35 epochs on the GoodNews dataset. It is quite low compared to the result reported in the paper (CIDEr:48.5). However, the paper mentioned that "training is stopped after the model has seen 6.6 million examples. This is equivalent to 16 epochs on GoodNews and 9 epochs on NYTimes800k", which makes me confused.
Besides, I observe that in the provided checkpoint files (metrics_epoch_99.json) of "5_transformer_roberta", the value of "best_epoch" is 99. Does it mean the checkpoint with the best performances is obtained after training for 99 epochs?

2. Does the training process can be early stopped? If so, could you be so kind to give me some guidance on how it works?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question on Training Epoch #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question on Training Epoch #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions