I trained using four A100 GUP and the total batch size is 36.
After a total of 300,000 times of training, this is the result of the model:

which is quite different from the result given in your paper :

I did not change the code, what could be the cause?
I trained using four A100 GUP and the total batch size is 36.

After a total of 300,000 times of training, this is the result of the model:
which is quite different from the result given in your paper :

I did not change the code, what could be the cause?