You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## PR Checklist
- [X] Use descriptive commit messages.
- [X] Provide tests for your changes.
- [X] Update any related documentation and include any relevant
screenshots.
- [X] Check if changes need to be made to docs (README or any guides in
`/docs/`).
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [X] Documentation Update
## Description
Adds the OLMES variant of the HumanEval
## Added/updated tests?
- [X] Yes
- [ ] No, and this is why: _please replace this line with details on why
tests
have not been included_
- [ ] I need help with writing tests
- File: [src/eval_framework/tasks/benchmarks/humaneval.py](../../src/eval_framework/tasks/benchmarks/humaneval.py) | [View on GitHub](https://github.com/Aleph-Alpha-Research/eval-framework/blob/main/src/eval_framework/tasks/benchmarks/humaneval.py)
17
+
18
+
- Link to dataset: [https://huggingface.co/datasets/openai/openai_humaneval](https://huggingface.co/datasets/openai/openai_humaneval)
19
+
20
+
More detailed documentation, with prompt examples and ground truth completions, can be generated with `uv run -m eval_framework.utils.generate_task_docs --add-prompt-examples --only-tasks "HumanEval_OLMES"`.
0 commit comments