Skip to content

some detail of gector-large #186

@liuxin99

Description

@liuxin99

@MaksTarnavskyi
I am interested in your paper “Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction” and I want to reproduce your results. However, I have some questions about the experimental details of your paper.

In your paper and the GitHub repository, you did not specify the GPU configuration and the hyperparameters for each stage of training. Could you please share this information with me?

Also, I encountered a strange problem when I was training the model. In the first stage of training, the GPU memory usage was very small at first, but then it gradually increased. Even with a V100 32G GPU, I got out-of-memory errors. Do you know what might cause this problem and how to solve it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions