Regarding the model training process, how low does the loss need to drop to match the performance reported in the paper?
Regarding the model training process, how low does the loss need to drop to match the performance reported in the paper?