some detail of gector-large

@MaksTarnavskyi
I am interested in your paper “Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction” and I want to reproduce your results. However, I have some questions about the experimental details of your paper.

In your paper and the GitHub repository, you did not specify the GPU configuration and the hyperparameters for each stage of training. Could you please share this information with me?

Also, I encountered a strange problem when I was training the model. In the first stage of training, the GPU memory usage was very small at first, but then it gradually increased. Even with a V100 32G GPU, I got out-of-memory errors. Do you know what might cause this problem and how to solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some detail of gector-large #186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

some detail of gector-large #186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions