Skip to content

Commit 2ec21ac

Browse files
Add interactive GRPO notebook and update README
- Add grpo_lora-interactive-notebook.ipynb for single-GPU GRPO training directly in the workbench - Include "Test the Trained Model" section with dynamic checkpoint loading - Update README to document both interactive and distributed execution modes - Update workbench requirements for interactive mode (8 CPU, 64Gi memory) - Remove custom reward function appendix from both notebooks (out of scope) Signed-off-by: Fiona-Waters <fiwaters6@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent d207981 commit 2ec21ac

3 files changed

Lines changed: 905 additions & 297 deletions

File tree

examples/fine-tuning/grpo/README.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,16 @@ The ART backend time-shares a single GPU between vLLM (inference) and Unsloth (t
2222

2323
The example uses the [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) dataset, which contains tool-calling conversations. The reward function verifies that the model produces syntactically correct tool calls with the expected function name and arguments.
2424

25-
## Execution mode
25+
## Execution Modes
2626

27-
GRPO runs as a **single-GPU TrainJob** submitted via the Kubeflow SDK. ART is single-GPU by design and manages its own vLLM subprocess internally.
27+
This example provides two notebooks:
2828

29-
The notebook submits a `TrainJob` from a lightweight workbench, and the training runs on a dedicated GPU pod managed by Kubeflow Trainer.
29+
| Mode | Notebook | Description |
30+
|------|----------|-------------|
31+
| **Interactive** | [`grpo_lora-interactive-notebook.ipynb`](./grpo_lora-interactive-notebook.ipynb) | Runs GRPO training directly on the workbench GPU. Best for exploration, prototyping, and quick iteration. |
32+
| **Distributed** | [`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb) | Submits a Kubeflow TrainJob from a lightweight workbench. Training runs on a dedicated GPU pod. Best for production workloads. |
33+
34+
ART is single-GPU by design and manages its own vLLM subprocess internally.
3035

3136
To learn more about execution modes for other algorithms, see the [fine-tuning execution modes overview](../README.md#execution-modes).
3237

@@ -52,13 +57,13 @@ to seamlessly run fine-tuning jobs.
5257

5358
| Image Type | Use Case | GPU | CPU | Memory |
5459
|------------|----------|-----|-----|--------|
55-
| Training \| Jupyter \| PyTorch \| CPU Python | Job submission and monitoring | None | 2 cores | 8Gi |
56-
| Training \| Jupyter \| PyTorch \| CUDA \| Python | Job submission + model evaluation | 1× GPU | 2 cores | 8Gi |
60+
| Training \| Jupyter \| PyTorch \| CPU Python | Distributed mode: job submission and monitoring | None | 2 cores | 8Gi |
61+
| Training \| Jupyter \| PyTorch \| CUDA \| Python | Interactive mode, or distributed + model evaluation | 1× GPU (40GB+ VRAM) | 8 cores | 64Gi |
5762

5863
> [!NOTE]
5964
>
60-
> - The workbench does not run the training itself — it submits a TrainJob and monitors progress.
61-
> - A GPU on the workbench is only needed if you want to load and test the fine-tuned LoRA adapter after training completes.
65+
> - **Distributed mode**: The workbench submits a TrainJob and monitors progress. A GPU on the workbench is only needed to test the fine-tuned LoRA adapter after training completes.
66+
> - **Interactive mode**: Training runs directly on the workbench GPU. The workbench needs an A100, H100, or L40S (40GB+ VRAM) with sufficient CPU and memory.
6267
6368
### Training Pod Requirements
6469

@@ -135,10 +140,12 @@ to seamlessly run fine-tuning jobs.
135140

136141
![](../images/05.png)
137142

138-
### Running the example notebook
143+
### Running the example notebooks
139144

140145
- From the workbench, clone this repository: `https://github.com/red-hat-data-services/red-hat-ai-examples.git`
141-
- Navigate to the `examples/fine-tuning/grpo` directory and open the [`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb) notebook.
146+
- Navigate to the `examples/fine-tuning/grpo` directory and open the notebook for your preferred execution mode:
147+
- **Interactive**: [`grpo_lora-interactive-notebook.ipynb`](./grpo_lora-interactive-notebook.ipynb) — runs training directly on the workbench GPU
148+
- **Distributed**: [`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb) — submits a TrainJob via Kubeflow Trainer
142149

143150
> [!NOTE]
144151
>

0 commit comments

Comments
 (0)