Skip to content

questions about MATE-KD #2

@jinxinglu

Description

@jinxinglu

hi, the mate-kd is an excellent work on NLP KD. Here I have a question about the codes of this paper.

In the section 4.1 of the paper, the authors said that two different teacher models (Roberta large and BERT base) were used in the two steps, but the codes showed that only one teacher model is used. Is it right?

on the other hand, the two steps should be trained separately? But the codes showed that in the training procedure, 10 steps for updating the params of generator, then 100 steps for updating the student model. That makes me feel wired.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions