Skip to content

Pull requests: Aleph-Alpha-Research/eval-framework

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

chore(main): release 0.2.14 autorelease: pending
#192 opened Mar 4, 2026 by github-actions bot Loading…
feat: Adds GSM8k with Olmes parity
#191 opened Mar 3, 2026 by prabhuteja12 Draft
5 of 13 tasks
feat: Feature to have metric aggregators like Pass@K
#190 opened Mar 2, 2026 by prabhuteja12 Loading…
3 of 13 tasks
feat: add MultiPL HumanEval & MBPPP tasks
#189 opened Mar 2, 2026 by tfburns Loading…
6 of 12 tasks
feat: Nucleus sampling for OpenAI, vLLM
#187 opened Mar 1, 2026 by prabhuteja12 Loading…
4 of 13 tasks
feat: Add the OLMES variant of the MBPP task
#186 opened Feb 26, 2026 by tfburns Loading…
6 of 12 tasks
feat: add OLMES variant of BigCodeBench
#184 opened Feb 26, 2026 by tfburns Loading…
7 of 12 tasks
fix: OLMES matching effort (MC Task Suite)
#182 opened Feb 24, 2026 by fsschneider Loading…
7 of 12 tasks
Update citation year and add version+author to README
#159 opened Jan 26, 2026 by tfburns Loading…
1 task done
chore: Bump pyasn1 from 0.6.1 to 0.6.2 in the uv group across 1 directory dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code
#157 opened Jan 16, 2026 by dependabot bot Loading…
docs: add LLM as judge guide
#151 opened Jan 12, 2026 by AhmedHammam-AA Loading…
fix(main): duplicated task that are actually the same
#144 opened Jan 7, 2026 by benureau Loading…
3 of 13 tasks
Remove leading space in ground truth formatting
#129 opened Dec 10, 2025 by SohirMaskey Loading…
3 of 13 tasks
harcoded date for consistent evals
#99 opened Nov 4, 2025 by GrS-AA Draft
13 tasks
Refactor Dataloading
#13 opened Aug 26, 2025 by bastitx Draft
1 of 13 tasks
ProTip! Follow long discussions with comments:>50.