Skip to content

Fix MMLU 5-shot evaluation in evaluator #603

@joellidin

Description

@joellidin

Description

The evaluator’s MMLU 5-shot evaluation is currently not working. The problem occurs when inputs are truncated by vLLM due to context length limits, which breaks few-shot evaluation. This prevents us from getting valid MMLU 5-shot metrics for the latest checkpoints.

Tasks

  • Investigate how vLLM handles input truncation during few-shot evaluation.
  • Adjust evaluator configuration or preprocessing so that MMLU 5-shot prompts fit within the allowed context length.
  • Verify that MMLU 5-shot results are produced and logged correctly.

Acceptance Criteria

  • Evaluator can successfully run MMLU 5-shot evaluation without truncation errors.
  • Correct metrics are reported for MMLU 5-shot in both WandB and InfluxDB.
  • Zero-shot and other task evaluations remain unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions