Replies: 1 comment 3 replies
-
|
There are many reasons this could be the case, but require a lot more details about your setup. Once simple think to check is that you're finetuning with dropout as this is often quite important. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What is the training_eval-accuracy that is reported in the event logs?
Here is what I am seeing during training on tweet classification task:

The training basically completes after 1000 steps. The reported "training_eval"-accuracy (on a 3-way classification task) is reported around 0.84 (which would have been an amazing score since RoBERTa Large is doing around 0.74 on the same set).
However doing a "real evaluation" on the same dataset (performed after saving the checkpoints in the training script), reveals a totally different situation:

The score of of around 0.68 is not very impressive. I have also looked through the stored predictions here, and calculated accuracy/f1 manually, and can confirm that this metric is correct.
I am trying to figure out why it is doing so bad on this task, and wanted to understand what the training_eval-accuracy really is reporting.
Beta Was this translation helpful? Give feedback.
All reactions