Dear Authors,
Unfortunately my training is resulting in bad evaluations. Can you roughly share what the supposed loss is for:
- (Phase 1) Encoder-Only Knowledge Distillation
- (Phase 2) Prompt-in-the-Loop Knowledge Distillation
For Phase 1 my loss is: 0.0010
For Phase 2 my loss is: 0.1139
I suppose Phase 2 gives me the problems. Can you confirm?
Thank you in advance!