You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve eval-results badge descriptions and add moderation note
Clarify badge table descriptions based on actual moon-landing implementation
and add a tip about how community scores can be moderated via PR lifecycle.
Copy file name to clipboardExpand all lines: docs/hub/eval-results.md
+8-5Lines changed: 8 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,14 +66,14 @@ Or, with only the required attributes:
66
66
value: 0.412
67
67
```
68
68
69
-
Results display badges based on their metadata in the YAML file:
69
+
Results display badges based on their metadata:
70
70
71
71
| Badge | Condition |
72
72
|-------|-----------|
73
-
| verified | A `verifyToken` is valid (evaluation ran in HF Jobs with inspect-ai) |
74
-
| community | Result submitted via open PR (not merged to main) |
75
-
| leaderboard | Links to the benchmark dataset |
76
-
| source | Links to evaluation logs or external source |
73
+
| verified | The result has a valid `verifyToken` (reproduced using [Inspect AI](https://inspect.aisi.org.uk/) on HF Jobs) |
74
+
| community | The result was submitted via an open PR (not merged to main) |
75
+
| leaderboard | The benchmark dataset has a leaderboard to link to |
76
+
| source | An external source URL is provided (e.g. evaluation logs, paper) |
77
77
78
78
For more details on how to format this data, check out the [Eval Results](https://github.com/huggingface/hub-docs/blob/main/eval_results.yaml) specifications.
79
79
@@ -86,3 +86,6 @@ Anyone can submit evaluation results to any model via Pull Request:
86
86
4. The PR will show as "community-provided" on the model page while open.
87
87
88
88
For help evaluating a model, see the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide.
89
+
90
+
> [!TIP]
91
+
> Community scores are visible while the PR is open. If a score is disputed, the model author can close the PR to remove it. The goal is to surface existing evaluation data transparently while building toward a fully reproducible standard via verified scores.
0 commit comments