Skip to content

Commit f89f93b

Browse files
committed
Improve eval-results badge descriptions and add moderation note
Clarify badge table descriptions based on actual moon-landing implementation and add a tip about how community scores can be moderated via PR lifecycle.
1 parent 326abbe commit f89f93b

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

docs/hub/eval-results.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,14 @@ Or, with only the required attributes:
6666
value: 0.412
6767
```
6868
69-
Results display badges based on their metadata in the YAML file:
69+
Results display badges based on their metadata:
7070
7171
| Badge | Condition |
7272
|-------|-----------|
73-
| verified | A `verifyToken` is valid (evaluation ran in HF Jobs with inspect-ai) |
74-
| community | Result submitted via open PR (not merged to main) |
75-
| leaderboard | Links to the benchmark dataset |
76-
| source | Links to evaluation logs or external source |
73+
| verified | The result has a valid `verifyToken` (reproduced using [Inspect AI](https://inspect.aisi.org.uk/) on HF Jobs) |
74+
| community | The result was submitted via an open PR (not merged to main) |
75+
| leaderboard | The benchmark dataset has a leaderboard to link to |
76+
| source | An external source URL is provided (e.g. evaluation logs, paper) |
7777

7878
For more details on how to format this data, check out the [Eval Results](https://github.com/huggingface/hub-docs/blob/main/eval_results.yaml) specifications.
7979

@@ -86,3 +86,6 @@ Anyone can submit evaluation results to any model via Pull Request:
8686
4. The PR will show as "community-provided" on the model page while open.
8787

8888
For help evaluating a model, see the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide.
89+
90+
> [!TIP]
91+
> Community scores are visible while the PR is open. If a score is disputed, the model author can close the PR to remove it. The goal is to surface existing evaluation data transparently while building toward a fully reproducible standard via verified scores.

0 commit comments

Comments
 (0)