Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions course/02_evaluate-hub-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Week 1: Evaluate a Hub Model

📣 TASK: Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.

>[!NOTE]
> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/hf-skills/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your (bonus) XP.

## Why This Matters

Model cards without evaluation data are hard to compare. By adding structured eval results to metadata, we make models easier to compare and review. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.

## Goals

- Add eval scores to the 100 trending models on the Hub
- Include AIME 2025, BigBenchHard, LiveCodeBench, MMLU, ARC on trending models.
- It is ok to include a subset of the benchmarks available for the model.
- Build a leaderboard application that shows the evaluation results for the trending models.

## XP Tiers

Taking part is simple. We need to get model authors to show evaluation results in their model cards. This is a clean up job!

| Tier | XP | Description | What Counts |
|-----------------|-------|---------------------------------------------------------------|-----------------------------------------------|
| 🐢 Contributor | 1 XP | Extract evaluation results from one benchmark and update its model card. | Any PR on the repo with evaluation data. |
| 🐕 Evaluator | 5 XP | Import scores from third-party benchmarks like Artificial Analysis. | Undefined benchmark scores and merged PRs. |
| 🦁 Advanced | 20 XP| Run your own evaluation with inspect-ai and publish results. | Original eval run and merged PR. |
| 🐉 Bonus | 20 XP| Contribute to the leaderboard application. | Any Merged PR on the hub or GitHub. |
| 🤢 Slop | -20 XP | Opening none useful PRs. | Duplicate PRs, Incorrect Eval Scores, Incorrect Benchmark Scores |

> [!WARNING]
> This hackathon is about advancing the state of open source AI. We want useful PRs that help everyone out, not just metrics.

## The Skill

Use `hugging-face-evaluation/` for this quest. Key capabilities:

- Extract evaluation tables from existing README content posted by model authors.
- Import benchmark scores from [Artificial Analysis](https://artificial.com/).
- Run your own evals with [inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai) on [HF Jobs](https://huggingface.co/docs/huggingface_hub/en/guides/jobs).
- Update model-index metadata in the model card.

>[!NOTE]
> Take a look at the [SKILL.md](https://github.com/huggingface/skills/blob/main/skills/hugging-face-evaluation/SKILL.md) for more details.

### Extract Evaluation Tables from README

1. Pick a Hub model without evaluation data from *trending models* on the hub
2. Use the skill to extract or add a benchmark score
3. Create a PR (or push directly if you own the model)

The agent will use this script to extract evaluation tables from the model's README.

```bash
python skills/hugging-face-evaluation/scripts/evaluation_manager.py extract-readme \
--repo-id "model-author/model-name" --dry-run
```

### Import Scores from Artificial Analysis

1. Find a model with benchmark data on external sites
2. Use `import-aa` to fetch scores from Artificial Analysis API
3. Create a PR with properly attributed evaluation data

The agent will use this script to fetch scores from Artificial Analysis API and add them to the model card.

```bash
python skills/hugging-face-evaluation/scripts/evaluation_manager.py import-aa \
--creator-slug "anthropic" --model-name "claude-sonnet-4" \
--repo-id "target/model" --create-pr
```

### Run your own evaluation with inspect-ai and publish results.

1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
2. Run the evaluation on HF Jobs infrastructure
3. Update the model card with your results and methodology

The agent will use this script to run the evaluation on HF Jobs infrastructure and update the model card with the results.

```bash
HF_TOKEN=$HF_TOKEN hf jobs uv run skills/hugging-face-evaluation/scripts/inspect_eval_uv.py \
--flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
-- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
```

## Tips

- Always use `--dry-run` first to preview changes before pushing
- Check for transposed tables where models are rows and benchmarks are columns
- Be careful with PRs for models you don't own — most maintainers appreciate eval contributions but be respectful.
- Manually validate the extracted scores and close PRs if needed.

## Resources

- [SKILL.md](../skills/hugging-face-evaluation/SKILL.md) — Full skill documentation
- [Example Usage](../skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md) — Worked examples
- [Metric Mapping](../skills/hugging-face-evaluation/examples/metric_mapping.json) — Standard metric types
74 changes: 74 additions & 0 deletions course/03_publish-hub-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Week 2: Publish a Hub Dataset

Create and share high-quality datasets on the Hub. Good data is the foundation of good models—help the community by contributing datasets others can train on.

## Why This Matters

The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.

## The Skill

Use `hugging-face-datasets/` for this quest. Key capabilities:

- Initialize dataset repos with proper structure
- Multi-format support: chat, classification, QA, completion, tabular
- Template-based validation for data quality
- Streaming uploads without downloading entire datasets

```bash
# Quick setup with a template
python skills/hugging-face-datasets/scripts/dataset_manager.py quick_setup \
--repo_id "your-username/dataset-name" --template chat
```

## XP Tiers

### 🐢 Starter — 50 XP

**Upload a small, clean dataset with a complete dataset card.**

1. Create a dataset with ≤1,000 rows
2. Write a dataset card covering: license, splits, and data provenance
3. Upload to the Hub under the hackathon organization (or your own account)

**What counts:** Clean data, clear documentation, proper licensing.

```bash
python skills/hugging-face-datasets/scripts/dataset_manager.py init \
--repo_id "hf-skills/your-dataset-name"

python skills/hugging-face-datasets/scripts/dataset_manager.py add_rows \
--repo_id "hf-skills/your-dataset-name" \
--template classification \
--rows_json "$(cat your_data.json)"
```

### 🐕 Standard — 100 XP

**Publish a conversational dataset with a complete dataset card.**

1. Create a dataset with ≤1,000 rows
2. Write a dataset card covering: license and splits.
3. Upload to the Hub under the hackathon organization.

**What counts:** Clean data, clear documentation, proper licensing.

### 🦁 Advanced — 200 XP

**Translate a dataset into multiple languages and publish it on the Hub.**

1. Find a dataset on the Hub
2. Translate the dataset into multiple languages
3. Publish the translated datasets on the Hub under the hackathon organization

**What counts:** Translated datasets and merged PRs.

## Resources

- [SKILL.md](../skills/hugging-face-datasets/SKILL.md) — Full skill documentation
- [Templates](../skills/hugging-face-datasets/templates/) — JSON templates for each format
- [Examples](../skills/hugging-face-datasets/examples/) — Sample data and system prompts

---

**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)
33 changes: 33 additions & 0 deletions course/04_sft-finetune-hub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Week 3: Supervised Fine-Tuning on the Hub

Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.

## Why This Matters

Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned models—along with your training methodology—you're giving the community ready-to-use solutions and reproducible recipes they can learn from.

## The Skill

Use `hugging-face-model-trainer/` for this quest. Key capabilities:

- **SFT** (Supervised Fine-Tuning) — Standard instruction tuning
- **DPO** (Direct Preference Optimization) — Alignment from preference data
- **GRPO** (Group Relative Policy Optimization) — Online RL training
- Cloud GPU training on HF Jobs—no local setup required
- Trackio integration for real-time monitoring
- GGUF conversion for local deployment

Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.

## XP Tiers

We'll announce the XP tiers for this quest soon.

## Resources

- [SKILL.md](../skills/hugging-face-model-trainer/SKILL.md) — Full skill documentation
- [SFT Example](../skills/hugging-face-model-trainer/scripts/train_sft_example.py) — Production SFT template
- [DPO Example](../skills/hugging-face-model-trainer/scripts/train_dpo_example.py) — Production DPO template
- [GRPO Example](../skills/hugging-face-model-trainer/scripts/train_grpo_example.py) — Production GRPO template
- [Training Methods](../skills/hugging-face-model-trainer/references/training_methods.md) — Method selection guide
- [Hardware Guide](../skills/hugging-face-model-trainer/references/hardware_guide.md) — GPU selection
121 changes: 121 additions & 0 deletions course/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: README
emoji: 🐠
colorFrom: yellow
colorTo: gray
sdk: static
pinned: false
---

# Humanity's Last Hackathon (of 2025)

<img src="https://github.com/huggingface/skills/raw/main/assets/banner.png" alt="Humanity's Last Hackathon (of 2025)" width="100%">

Welcome to our hackathon!

Whether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents
and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not
about learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.

## What We're Building

Over four weeks, we're using coding agents to level up the open source AI ecosystem:

- **Week 1** — Evaluate models and build a distributed leaderboard
- **Week 2** — Create high-quality datasets for the community
- **Week 3** — Fine-tune and share models on the Hub
- **Week 4** — Sprint to the finish line together

Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!

Here's the schedule:

| Date | Event | Link |
|------|-------|------|
| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
| Dec 4 (Wed) | Livestream 1 | [Q&A 1](https://youtube.com/live/rworGSh-Rgk?feature=share) |
| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
| Dec 11 (Wed) | Livestream 2 | TBA |
| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
| Dec 18 (Wed) | Livestream 3 | TBA |
| Dec 23 (Mon) | Week 4 Community Sprint | TBA |
| Dec 31 (Tue) | Hackathon Ends | TBA

## Getting Started

### 1. Join the Organization

Join [hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.

### 2. Set Up Your Coding Agent

Use whatever coding agent you prefer:

- **Claude Code** — `claude` in your terminal
- **Codex** — `codex` CLI
- **Gemini CLI** — `gemini` in your terminal
- **Cursor / Windsurf** — IDE-based agents
- **Open source** — aider, continue, etc.

The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).

### 3. Get Your HF Token

Most quests require a Hugging Face token with write access:

```bash
# mac/linux
curl -LsSf https://hf.co/cli/install.sh | bash

# windows
powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"

# Login (creates/stores your token)
hf auth login
```

This will set your `HF_TOKEN` environment variable.

### 4. Clone the Skills Repo

```bash
git clone https://github.com/huggingface/skills.git
cd skills
```

Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.

## Your First Quest

**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.

<iframe
src="https://hf-skills-hacker-leaderboard.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>

[Leaderboard](https://hf-skills-hacker-leaderboard.hf.space)

## Earning XP

Each quest has three tiers:

| Tier | What it takes | XP |
|------|---------------|-----|
| 🐢 | Complete the basics | 50-75 XP |
| 🐕 | Go deeper with more features | 100-125 XP |
| 🦁 | Ship something impressive | 200-225 XP |

You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.

## Getting Help

- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) — Join the Hugging Face Discord for real-time help
- [Livestreams](https://www.youtube.com/@HuggingFace/streams) — Weekly streams with walkthroughs and Q&A
- [Issues](https://github.com/huggingface/skills/issues) — Open an issue in this repo if you're stuck

To join the Hackathon, join the organization on the hub and setup your coding agent.

Ready? Let's ship some AI. 🚀
Loading