Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

fix: clean up MasakhaNEWS prompt whitespace and typo
#3580 opened Feb 11, 2026 by Mr-Neutr0n Loading…
New NorEval tasks
#3572 opened Feb 9, 2026 by davda54 Loading…
Update of NorEval implementation
#3571 opened Feb 9, 2026 by davda54 Draft
MMLU PRO chat variant
#3568 opened Feb 6, 2026 by anmarques Loading…
feat(tasks): add Persian XNLI evaluation task
#3553 opened Feb 3, 2026 by jayvenn21 Loading…
Add Intel Gaudi support
#3550 opened Feb 3, 2026 by 12010486 Loading…
feat(tasks): add LongProc benchmark (6 task types, 16 configs)
#3544 opened Feb 1, 2026 by xiye17 Loading…
4 tasks done
feat(task): add MMLU-CF contamination-free benchmark
#3542 opened Jan 31, 2026 by fistyee Loading…
add french and korean gsm8k
#3541 opened Jan 30, 2026 by bknyaz Loading…
Added pass@k and avg@k metrics to AIME benchmark
#3510 opened Jan 21, 2026 by annafontanaa Loading…
[TASKS] add tasks from GDN paper
#3507 opened Jan 21, 2026 by mayank31398 Loading…
3 tasks done
Hineni
#3506 opened Jan 20, 2026 by Kevinobote Loading…
Presets
#3494 opened Jan 13, 2026 by baberabb Loading…
feat: support local directory as dataset_path
#3485 opened Jan 5, 2026 by fanjingxiang Loading…
Fix utils.py for MATH500 evaluation
#3478 opened Dec 24, 2025 by sheriyuo Loading…
ProTip! Follow long discussions with comments:>50.