-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hi Alasdair,
Thank you for your great work. I have two questions. Hope you can help me out.
-
How many epochs of training are required for obtaining the results shown in the paper?
I have tried to re-train the "5_transformer_roberta" variant from scratch and I get a CIDEr score of 35.6 after training for 35 epochs on the GoodNews dataset. It is quite low compared to the result reported in the paper (CIDEr:48.5). However, the paper mentioned that "training is stopped after the model has seen 6.6 million examples. This is equivalent to 16 epochs on GoodNews and 9 epochs on NYTimes800k", which makes me confused.
Besides, I observe that in the provided checkpoint files (metrics_epoch_99.json) of "5_transformer_roberta", the value of "best_epoch" is 99. Does it mean the checkpoint with the best performances is obtained after training for 99 epochs? -
Does the training process can be early stopped? If so, could you be so kind to give me some guidance on how it works?
Thanks!