-
Notifications
You must be signed in to change notification settings - Fork 220
Open
Description
@MaksTarnavskyi
I am interested in your paper “Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction” and I want to reproduce your results. However, I have some questions about the experimental details of your paper.
In your paper and the GitHub repository, you did not specify the GPU configuration and the hyperparameters for each stage of training. Could you please share this information with me?
Also, I encountered a strange problem when I was training the model. In the first stage of training, the GPU memory usage was very small at first, but then it gradually increased. Even with a V100 32G GPU, I got out-of-memory errors. Do you know what might cause this problem and how to solve it?
Metadata
Metadata
Assignees
Labels
No labels