Skip to content

Commit 434609e

Browse files
Enhance benchmark registration doc (#2117)
* Enhance benchmark registration doc * Fix command for launching Inspect viewer Updated command from 'inspect viewer' to 'inspect view' in documentation. * Update docs/hub/eval-results.md Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com> --------- Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
1 parent 45644e1 commit 434609e

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

docs/hub/eval-results.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,14 @@ Dataset repos can be defined as **Benchmarks** (e.g., [AIME](https://huggingface
1616
To register your dataset as a benchmark:
1717

1818
1. Create a dataset repo containing your evaluation data
19-
2. Add an `eval.yaml` file to the repo root with your benchmark configuration
19+
2. Add an `eval.yaml` file to the repo root with your [benchmark configuration](https://inspect.aisi.org.uk/tasks.html#hugging-face)
2020
3. The file is validated at push time
2121
4. (**Beta**) Get in touch so we can add it to the allow-list.
2222

2323
The `eval.yaml` format is based on [Inspect AI](https://inspect.aisi.org.uk/), enabling reproducible evaluations. See the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide for details on running evaluations.
2424

25+
Examples can be found in these benchmarks: [SimpleQA](https://huggingface.co/datasets/OpenEvals/SimpleQA/blob/main/eval.yaml), [AIME 24](https://huggingface.co/datasets/OpenEvals/aime_24/blob/main/eval.yaml), [MuSR](https://huggingface.co/datasets/OpenEvals/MuSR/blob/main/eval.yaml)
26+
2527
<!-- TODO: Add example of eval.yaml file -->
2628

2729
## Model Evaluation Results

docs/inference-providers/guides/evaluation-inspect-ai.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Once it finishes, we'll see the evaluation results:
6767
Besides the command line report, Inspect comes with a nice viewer UI. We can launch it with the following command:
6868

6969
```bash
70-
inspect viewer
70+
inspect view
7171
```
7272
![Screenshot of inspect viewer results with gpt-oss-20b](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/evals-guide-first-eval-viewer.png)
7373

@@ -92,7 +92,7 @@ If everything went well we will see the evaluations running in parallel for each
9292

9393

9494
```bash
95-
inspect viewer
95+
inspect view
9696
```
9797
![Screenshot of inspect viewer results with gpt-oss-20b](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/evals-guide-model-bench-viewer.png)
9898

@@ -250,4 +250,4 @@ inspect eval animal_or_else.py --model hf-inference-providers/Qwen/Qwen3-VL-30B-
250250
# Next Steps
251251
* Explore [Inspect's documentation](https://inspect.aisi.org.uk/) to learn more about model evaluation.
252252
* Check out the [lighteval](https://github.com/huggingface/lighteval) library. It comes with over [1,000 tasks](https://huggingface.co/spaces/OpenEvals/open_benchmark_index), so you don't have to write any code, and it gives you several quality-of-life features for quickly running evaluations.
253-
* Browse models available through Inference Providers to find the best model for your needs and run your own evaluations.
253+
* Browse models available through Inference Providers to find the best model for your needs and run your own evaluations.

0 commit comments

Comments
 (0)