Description
Llama 3 models fine-tuned with QuanTA produce meaningless phrases when evaluated on IFEval. This repetition degrades IFEval scores. The same issue does not occur with Llama 2 models under the same QuanTA configuration.
Other fine-tuning methods (LoRA, FFT) on the same Llama 3 model also do not exhibit this behavior — the issue appears to be QuanTA.
Reproduction
- Fine-tune a Llama 3 model with QuanTA.
- Evaluate on IFEval (lm-evaluation-harness).
- Observe sentence-level repetition in generated outputs.
Description
Llama 3 models fine-tuned with QuanTA produce meaningless phrases when evaluated on IFEval. This repetition degrades IFEval scores. The same issue does not occur with Llama 2 models under the same QuanTA configuration.
Other fine-tuning methods (LoRA, FFT) on the same Llama 3 model also do not exhibit this behavior — the issue appears to be QuanTA.
Reproduction