Improve eval-results badge descriptions and add moderation note

gary149 · gary149 · commit f89f93b32f4f · 2026-02-06T16:33:14.000+01:00
Clarify badge table descriptions based on actual moon-landing implementation
and add a tip about how community scores can be moderated via PR lifecycle.
diff --git a/docs/hub/eval-results.md b/docs/hub/eval-results.md
@@ -66,14 +66,14 @@ Or, with only the required attributes:
   value: 0.412
 ```
 
-Results display badges based on their metadata in the YAML file:
+Results display badges based on their metadata:
 
 | Badge | Condition |
 |-------|-----------|
-| verified | A `verifyToken` is valid (evaluation ran in HF Jobs with inspect-ai) |
-| community | Result submitted via open PR (not merged to main) |
-| leaderboard | Links to the benchmark dataset |
-| source | Links to evaluation logs or external source |
+| verified | The result has a valid `verifyToken` (reproduced using [Inspect AI](https://inspect.aisi.org.uk/) on HF Jobs) |
+| community | The result was submitted via an open PR (not merged to main) |
+| leaderboard | The benchmark dataset has a leaderboard to link to |
+| source | An external source URL is provided (e.g. evaluation logs, paper) |
 
 For more details on how to format this data, check out the [Eval Results](https://github.com/huggingface/hub-docs/blob/main/eval_results.yaml) specifications.
 
@@ -86,3 +86,6 @@ Anyone can submit evaluation results to any model via Pull Request:
 4. The PR will show as "community-provided" on the model page while open.
 
 For help evaluating a model, see the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide.
+
+> [!TIP]
+> Community scores are visible while the PR is open. If a score is disputed, the model author can close the PR to remove it. The goal is to surface existing evaluation data transparently while building toward a fully reproducible standard via verified scores.