-
Notifications
You must be signed in to change notification settings - Fork 3k
Pull requests: EleutherAI/lm-evaluation-harness
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: Update
WatsonxLLM class mapping and errors
#3591
opened Feb 17, 2026 by
Rafal-Chrzanowski-IBM
Loading…
Fix correctness issues in Arabic normalization and prompt loading
#3589
opened Feb 15, 2026 by
RinZ27
Loading…
add GreekMMLU (official native-sourced benchmark) task configuration
#3581
opened Feb 11, 2026 by
mersinkonomi
Loading…
fix: clean up MasakhaNEWS prompt whitespace and typo
#3580
opened Feb 11, 2026 by
Mr-Neutr0n
Loading…
Add jfinqa: Japanese Financial Numerical Reasoning QA (1000 questions)
#3570
opened Feb 8, 2026 by
ajtgjmdjp
Loading…
6 tasks done
fix(bigbench): add group for task discovery (bigbench)
#3556
opened Feb 4, 2026 by
jayvenn21
Loading…
feat (tasks): add AAII GPQA Diamond tasks, extraction regex, and reasoning/non-reasoning wrappers
#3547
opened Feb 3, 2026 by
saint1729
Loading…
feat(tasks): add LongProc benchmark (6 task types, 16 configs)
#3544
opened Feb 1, 2026 by
xiye17
Loading…
4 tasks done
Adds llm-as-a-judge support via new metric
#3534
opened Jan 27, 2026 by
jbross-ibm-research
Loading…
Implement new translation tasks for google WMT24++ datasets
#3480
opened Dec 25, 2025 by
grzegorz-aniol
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.