-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Description
The evaluator’s MMLU 5-shot evaluation is currently not working. The problem occurs when inputs are truncated by vLLM due to context length limits, which breaks few-shot evaluation. This prevents us from getting valid MMLU 5-shot metrics for the latest checkpoints.
Tasks
- Investigate how vLLM handles input truncation during few-shot evaluation.
- Adjust evaluator configuration or preprocessing so that MMLU 5-shot prompts fit within the allowed context length.
- Verify that MMLU 5-shot results are produced and logged correctly.
Acceptance Criteria
- Evaluator can successfully run MMLU 5-shot evaluation without truncation errors.
- Correct metrics are reported for MMLU 5-shot in both WandB and InfluxDB.
- Zero-shot and other task evaluations remain unaffected.
Metadata
Metadata
Assignees
Labels
No labels