Skip to content

Commit 3f3d54f

Browse files
authored
Update examples/README.md: remove old references and reorganize (#84)
* Update examples/README.md: remove old references and reorganize * Delete verify_ppo.py
1 parent 325cf9f commit 3f3d54f

2 files changed

Lines changed: 4 additions & 135 deletions

File tree

examples/README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,14 @@ This directory contains examples, demos, and helper scripts for using the Open-R
1616
## Examples Overview
1717

1818
### Supervised Fine-Tuning (SFT)
19-
* **[Hello World SFT Sandbox](sft/hello-world):** A minimal fine-tuning pipeline that trains a model to output a specific, constant target answer (e.g., "foo") for a set of hardcoded questions, serving as an introductory "hello world" for basic API interactions.
2019
* **[Pig Latin Translation](sft/pig-latin):** Teaches a model to perform specialized Pig Latin transformations, demonstrating custom token-level targets and loss masks.
2120
* **[Text-to-SQL SFT](sft/text-to-sql):** Adapts Gemma 3 into a specialized database query assistant capable of generating SQL statements.
22-
* **[FunctionGemma](sft/function-gemma):** A recipe specifically targeted at fine-tuning tool-use capabilities, enabling models to reliably select and invoke functions.
23-
* **[Tinker Cookbook Recipes](tinker-cookbook):** Shows how to run Thinking Machines' `tinker_cookbook` recipes against Open-RL APIs.
21+
* **[FunctionGemma](sft/function-gemma):** Provides a recipe specifically targeted at fine-tuning tool-use capabilities, enabling models to reliably select and invoke functions.
2422

2523
### Reinforcement Learning (RL)
26-
* **[PPO Math Verification](rl):** Implements Proximal Policy Optimization (PPO) with advantages to verify step-by-step math reasoning paths.
27-
* **[RLVR Demo](rl/rlvr):** Showcases Reinforcement Learning with Verifiable Rewards (RLVR) on geography tasks, using deterministic format verification as the primary reward signal.
2824
* **[Text-to-SQL RL](rl/text-to-sql):** Runs the Gemma 4 SFT+RL recipe with SQL execution rewards and curve plotting.
29-
* **[Tinker RL Basic K8s Jobs](rl/tinker-rl-basic):** Example Kubernetes Job manifests for deploying scalable, distributed RL workloads to a multi-tenant cluster.
25+
26+
### Tinker Cookbook
27+
* **[Tinker Cookbook Recipes](tinker-cookbook):** Examples showing how to run [Tinker Cookbook](https://github.com/thinking-machines-lab/tinker-cookbook) recipes with Open-RL.
3028

3129
---

examples/rl/verify_ppo.py

Lines changed: 0 additions & 129 deletions
This file was deleted.

0 commit comments

Comments
 (0)