Update benchmark dataset links in eval-results.md (#2118)

davanstrien · web-flow · commit 2604ea36f84c · 2025-12-17T10:14:54.000Z
diff --git a/docs/hub/eval-results.md b/docs/hub/eval-results.md
@@ -7,7 +7,7 @@ The Hub provides a decentralized system for tracking model evaluation results. B
 
 ## Benchmark Datasets
 
-Dataset repos can be defined as **Benchmarks** (e.g., [AIME](https://huggingface.co/datasets/aime-ai/aime), [HLE](https://huggingface.co/datasets/cais/hle), [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa)). These display a "Benchmark" tag and automatically aggregate evaluation results from model repos across the Hub and display a leaderboard of top models.
+Dataset repos can be defined as **Benchmarks** (e.g., [AIME](https://huggingface.co/datasets/OpenEvals/aime_24), [HLE](https://huggingface.co/datasets/cais/hle), [GPQA](https://huggingface.co/datasets/Idavidrein/gpqa)). These display a "Benchmark" tag and automatically aggregate evaluation results from model repos across the Hub and display a leaderboard of top models.
 
 ![Benchmark Dataset](https://huggingface.co/huggingface/documentation-images/resolve/main/evaluation-results/benchmark-preview.png)
 
@@ -82,4 +82,4 @@ Anyone can submit evaluation results to any model via Pull Request:
 3. Add a `.eval_results/*.yaml` file with your results.
 4. The PR will show as "community-provided" on the model page while open.
 
-For help evaluating a model, see the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide.
+For help evaluating a model, see the [Evaluating models with Inspect](https://huggingface.co/docs/inference-providers/guides/evaluation-inspect-ai) guide.