-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
- Paper: Knowledge distillation: A good teacher is patient and consistent
- Paper Link: https://arxiv.org/abs/2106.05237
Description
- Paper focuses on 2 important aspects of Knowledge Distillation: Consistency & Patience.
- In function matching, the authors quote knowledge distillation shouldn’t just be about matching the predictions on this target data and you should try to increase the support of the data distribution. So what they use here is something called mixup augmentation, you can use out-of-domain data or this sort of mix-up data way of interpreting between data points to match the function across the data distribution with an interesting view of the sample.
- Another component of the Knowledge distillation training recipe is patience. Knowledge distillation benefits from long training schedules.
Results:

Metadata
Metadata
Assignees
Labels
No labels