Enhance benchmark registration doc (#2117)

NathanHB · davanstrien · web-flow · commit 434609e6d09f · 2025-12-19T15:12:51.000+01:00
* Enhance benchmark registration doc

* Fix command for launching Inspect viewer

Updated command from 'inspect viewer' to 'inspect view' in documentation.

* Update docs/hub/eval-results.md

Co-authored-by: Daniel van Strien &lt;davanstrien@users.noreply.github.com&gt;

---------

Co-authored-by: Daniel van Strien &lt;davanstrien@users.noreply.github.com&gt;
diff --git a/docs/hub/eval-results.md b/docs/hub/eval-results.md
@@ -16,12 +16,14 @@ Dataset repos can be defined as **Benchmarks** (e.g., [AIME](https://huggingface
 To register your dataset as a benchmark:
 
 1. Create a dataset repo containing your evaluation data
-2. Add an `eval.yaml` file to the repo root with your benchmark configuration
+2. Add an `eval.yaml` file to the repo root with your [benchmark configuration](https://inspect.aisi.org.uk/tasks.html#hugging-face)
 3. The file is validated at push time
 4. (**Beta**) Get in touch so we can add it to the allow-list.
 
 The `eval.yaml` format is based on [Inspect AI](https://inspect.aisi.org.uk/), enabling reproducible evaluations. See the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide for details on running evaluations.
 
+Examples can be found in these benchmarks: [SimpleQA](https://huggingface.co/datasets/OpenEvals/SimpleQA/blob/main/eval.yaml), [AIME 24](https://huggingface.co/datasets/OpenEvals/aime_24/blob/main/eval.yaml), [MuSR](https://huggingface.co/datasets/OpenEvals/MuSR/blob/main/eval.yaml)
+
 <!-- TODO: Add example of eval.yaml file -->
 
 ## Model Evaluation Results
diff --git a/docs/inference-providers/guides/evaluation-inspect-ai.md b/docs/inference-providers/guides/evaluation-inspect-ai.md
@@ -67,7 +67,7 @@ Once it finishes, we'll see the evaluation results:
 Besides the command line report, Inspect comes with a nice viewer UI. We can launch it with the following command:
 
 ```bash
-inspect viewer
+inspect view
 ```
 ![Screenshot of inspect viewer results with gpt-oss-20b](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/evals-guide-first-eval-viewer.png)
 
@@ -92,7 +92,7 @@ If everything went well we will see the evaluations running in parallel for each
 
 
 ```bash
-inspect viewer
+inspect view
 ```
 ![Screenshot of inspect viewer results with gpt-oss-20b](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/evals-guide-model-bench-viewer.png)
 
@@ -250,4 +250,4 @@ inspect eval animal_or_else.py --model hf-inference-providers/Qwen/Qwen3-VL-30B-
 # Next Steps
 * Explore [Inspect's documentation](https://inspect.aisi.org.uk/) to learn more about model evaluation.
 * Check out the [lighteval](https://github.com/huggingface/lighteval) library. It comes with over [1,000 tasks](https://huggingface.co/spaces/OpenEvals/open_benchmark_index), so you don't have to write any code, and it gives you several quality-of-life features for quickly running evaluations.
-* Browse models available through Inference Providers to find the best model for your needs and run your own evaluations.
+* Browse models available through Inference Providers to find the best model for your needs and run your own evaluations.