huggingface · burtenshaw · Mar 24, 2026
diff --git a/course/02_evaluate-hub-model.md b/course/02_evaluate-hub-model.md
@@ -0,0 +1,98 @@
+# Week 1: Evaluate a Hub Model
+
+📣 TASK: Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.
+
+>[!NOTE]
+> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/hf-skills/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your (bonus) XP.
+
+## Why This Matters
+
+Model cards without evaluation data are hard to compare. By adding structured eval results to metadata, we make models easier to compare and review. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.
+
+## Goals
+
+- Add eval scores to the 100 trending models on the Hub
+- Include AIME 2025, BigBenchHard, LiveCodeBench, MMLU, ARC on trending models.
+- It is ok to include a subset of the benchmarks available for the model.
+- Build a leaderboard application that shows the evaluation results for the trending models.
+
+## XP Tiers
+
+Taking part is simple. We need to get model authors to show evaluation results in their model cards. This is a clean up job!
+
+| Tier            | XP    | Description                                                   | What Counts                                  |
+|-----------------|-------|---------------------------------------------------------------|-----------------------------------------------|
+| 🐢 Contributor  | 1 XP  | Extract evaluation results from one benchmark and update its model card. | Any PR on the repo with evaluation data.      |
+| 🐕 Evaluator    | 5 XP  | Import scores from third-party benchmarks like Artificial Analysis. | Undefined benchmark scores and merged PRs.    |
+| 🦁 Advanced     | 20 XP| Run your own evaluation with inspect-ai and publish results.   | Original eval run and merged PR.              |
+| 🐉 Bonus        | 20 XP| Contribute to the leaderboard application.                      | Any Merged PR on the hub or GitHub.                  |
+| 🤢 Slop         | -20 XP  | Opening none useful PRs.                  | Duplicate PRs, Incorrect Eval Scores, Incorrect Benchmark Scores          |
+
+> [!WARNING]
+> This hackathon is about advancing the state of open source AI. We want useful PRs that help everyone out, not just metrics. 
+
+## The Skill
+
+Use `hugging-face-evaluation/` for this quest. Key capabilities:
+
+- Extract evaluation tables from existing README content posted by model authors.
+- Import benchmark scores from [Artificial Analysis](https://artificial.com/).
+- Run your own evals with [inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai) on [HF Jobs](https://huggingface.co/docs/huggingface_hub/en/guides/jobs).
+- Update model-index metadata in the model card.
+
+>[!NOTE]
+> Take a look at the [SKILL.md](https://github.com/huggingface/skills/blob/main/skills/hugging-face-evaluation/SKILL.md) for more details.
+
+### Extract Evaluation Tables from README
+
+1. Pick a Hub model without evaluation data from *trending models* on the hub
+2. Use the skill to extract or add a benchmark score
+3. Create a PR (or push directly if you own the model)
+
+The agent will use this script to extract evaluation tables from the model's README.
+
+```bash
+python skills/hugging-face-evaluation/scripts/evaluation_manager.py extract-readme \
+  --repo-id "model-author/model-name" --dry-run
+```
+
+### Import Scores from Artificial Analysis
+
+1. Find a model with benchmark data on external sites
+2. Use `import-aa` to fetch scores from Artificial Analysis API
+3. Create a PR with properly attributed evaluation data
+
+The agent will use this script to fetch scores from Artificial Analysis API and add them to the model card.
+
+```bash
+python skills/hugging-face-evaluation/scripts/evaluation_manager.py import-aa \
+  --creator-slug "anthropic" --model-name "claude-sonnet-4" \
+  --repo-id "target/model" --create-pr
+```
+
+### Run your own evaluation with inspect-ai and publish results.
+
+1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
+2. Run the evaluation on HF Jobs infrastructure
+3. Update the model card with your results and methodology
+
+The agent will use this script to run the evaluation on HF Jobs infrastructure and update the model card with the results.
+
+```bash
+HF_TOKEN=$HF_TOKEN hf jobs uv run skills/hugging-face-evaluation/scripts/inspect_eval_uv.py \
+  --flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
+  -- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
+```
+
+## Tips
+
+- Always use `--dry-run` first to preview changes before pushing
+- Check for transposed tables where models are rows and benchmarks are columns
+- Be careful with PRs for models you don't own — most maintainers appreciate eval contributions but be respectful.
+- Manually validate the extracted scores and close PRs if needed.
+
+## Resources
+
+- [SKILL.md](../skills/hugging-face-evaluation/SKILL.md) — Full skill documentation
+- [Example Usage](../skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md) — Worked examples
+- [Metric Mapping](../skills/hugging-face-evaluation/examples/metric_mapping.json) — Standard metric types
diff --git a/course/03_publish-hub-dataset.md b/course/03_publish-hub-dataset.md
@@ -0,0 +1,74 @@
+# Week 2: Publish a Hub Dataset
+
+Create and share high-quality datasets on the Hub. Good data is the foundation of good models—help the community by contributing datasets others can train on.
+
+## Why This Matters
+
+The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.
+
+## The Skill
+
+Use `hugging-face-datasets/` for this quest. Key capabilities:
+
+- Initialize dataset repos with proper structure
+- Multi-format support: chat, classification, QA, completion, tabular
+- Template-based validation for data quality
+- Streaming uploads without downloading entire datasets
+
+```bash
+# Quick setup with a template
+python skills/hugging-face-datasets/scripts/dataset_manager.py quick_setup \
+  --repo_id "your-username/dataset-name" --template chat
+```
+
+## XP Tiers
+
+### 🐢 Starter — 50 XP
+
+**Upload a small, clean dataset with a complete dataset card.**
+
+1. Create a dataset with ≤1,000 rows
+2. Write a dataset card covering: license, splits, and data provenance
+3. Upload to the Hub under the hackathon organization (or your own account)
+
+**What counts:** Clean data, clear documentation, proper licensing.
+
+```bash
+python skills/hugging-face-datasets/scripts/dataset_manager.py init \
+  --repo_id "hf-skills/your-dataset-name"
+
+python skills/hugging-face-datasets/scripts/dataset_manager.py add_rows \
+  --repo_id "hf-skills/your-dataset-name" \
+  --template classification \
+  --rows_json "$(cat your_data.json)"
+```
+
+### 🐕 Standard — 100 XP
+
+**Publish a conversational dataset with a complete dataset card.**
+
+1. Create a dataset with ≤1,000 rows
+2. Write a dataset card covering: license and splits.
+3. Upload to the Hub under the hackathon organization.
+
+**What counts:** Clean data, clear documentation, proper licensing.
+
+### 🦁 Advanced — 200 XP
+
+**Translate a dataset into multiple languages and publish it on the Hub.**
+
+1. Find a dataset on the Hub
+2. Translate the dataset into multiple languages
+3. Publish the translated datasets on the Hub under the hackathon organization
+
+**What counts:** Translated datasets and merged PRs.
+
+## Resources
+
+- [SKILL.md](../skills/hugging-face-datasets/SKILL.md) — Full skill documentation
+- [Templates](../skills/hugging-face-datasets/templates/) — JSON templates for each format
+- [Examples](../skills/hugging-face-datasets/examples/) — Sample data and system prompts
+
+---
+
+**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)
diff --git a/course/04_sft-finetune-hub.md b/course/04_sft-finetune-hub.md
@@ -0,0 +1,33 @@
+# Week 3: Supervised Fine-Tuning on the Hub
+
+Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.
+
+## Why This Matters
+
+Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned models—along with your training methodology—you're giving the community ready-to-use solutions and reproducible recipes they can learn from.
+
+## The Skill
+
+Use `hugging-face-model-trainer/` for this quest. Key capabilities:
+
+- **SFT** (Supervised Fine-Tuning) — Standard instruction tuning
+- **DPO** (Direct Preference Optimization) — Alignment from preference data
+- **GRPO** (Group Relative Policy Optimization) — Online RL training
+- Cloud GPU training on HF Jobs—no local setup required
+- Trackio integration for real-time monitoring
+- GGUF conversion for local deployment
+
+Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.
+
+## XP Tiers
+
+We'll announce the XP tiers for this quest soon.
+
+## Resources
+
+- [SKILL.md](../skills/hugging-face-model-trainer/SKILL.md) — Full skill documentation
+- [SFT Example](../skills/hugging-face-model-trainer/scripts/train_sft_example.py) — Production SFT template
+- [DPO Example](../skills/hugging-face-model-trainer/scripts/train_dpo_example.py) — Production DPO template
+- [GRPO Example](../skills/hugging-face-model-trainer/scripts/train_grpo_example.py) — Production GRPO template
+- [Training Methods](../skills/hugging-face-model-trainer/references/training_methods.md) — Method selection guide
+- [Hardware Guide](../skills/hugging-face-model-trainer/references/hardware_guide.md) — GPU selection
diff --git a/course/README.md b/course/README.md
@@ -0,0 +1,121 @@
+---
+title: README
+emoji: 🐠
+colorFrom: yellow
+colorTo: gray
+sdk: static
+pinned: false
+---
+
+# Humanity's Last Hackathon (of 2025)
+
+<img src="https://github.com/huggingface/skills/raw/main/assets/banner.png" alt="Humanity's Last Hackathon (of 2025)" width="100%">
+
+Welcome to our hackathon!
+
+Whether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents 
+and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not 
+about learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.
+
+## What We're Building
+
+Over four weeks, we're using coding agents to level up the open source AI ecosystem:
+
+- **Week 1** — Evaluate models and build a distributed leaderboard
+- **Week 2** — Create high-quality datasets for the community  
+- **Week 3** — Fine-tune and share models on the Hub
+- **Week 4** — Sprint to the finish line together
+
+Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!
+
+Here's the schedule:
+
+| Date | Event | Link |
+|------|-------|------|
+| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
+| Dec 4 (Wed) | Livestream 1 | [Q&A 1](https://youtube.com/live/rworGSh-Rgk?feature=share) |
+| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
+| Dec 11 (Wed) | Livestream 2 | TBA |
+| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
+| Dec 18 (Wed) | Livestream 3 | TBA |
+| Dec 23 (Mon) | Week 4 Community Sprint | TBA |
+| Dec 31 (Tue) | Hackathon Ends | TBA
+
+## Getting Started
+
+### 1. Join the Organization
+
+Join [hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.
+
+### 2. Set Up Your Coding Agent
+
+Use whatever coding agent you prefer:
+
+- **Claude Code** — `claude` in your terminal
+- **Codex** — `codex` CLI
+- **Gemini CLI** — `gemini` in your terminal
+- **Cursor / Windsurf** — IDE-based agents
+- **Open source** — aider, continue, etc.
+
+The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).
+
+### 3. Get Your HF Token
+
+Most quests require a Hugging Face token with write access:
+
+```bash
+# mac/linux
+curl -LsSf https://hf.co/cli/install.sh | bash
+
+# windows
+powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
+
+# Login (creates/stores your token)
+hf auth login
+```
+
+This will set your `HF_TOKEN` environment variable.
+
+### 4. Clone the Skills Repo
+
+```bash
+git clone https://github.com/huggingface/skills.git
+cd skills
+```
+
+Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.
+
+## Your First Quest
+
+**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.
+
+<iframe
+	src="https://hf-skills-hacker-leaderboard.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+[Leaderboard](https://hf-skills-hacker-leaderboard.hf.space)
+
+## Earning XP
+
+Each quest has three tiers:
+
+| Tier | What it takes | XP |
+|------|---------------|-----|
+| 🐢 | Complete the basics | 50-75 XP |
+| 🐕 | Go deeper with more features | 100-125 XP |
+| 🦁 | Ship something impressive | 200-225 XP |
+
+You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.
+
+## Getting Help
+
+- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) — Join the Hugging Face Discord for real-time help
+- [Livestreams](https://www.youtube.com/@HuggingFace/streams) — Weekly streams with walkthroughs and Q&A
+- [Issues](https://github.com/huggingface/skills/issues) — Open an issue in this repo if you're stuck
+
+To join the Hackathon, join the organization on the hub and setup your coding agent. 
+
+Ready? Let's ship some AI. 🚀