You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add grpo_lora-interactive-notebook.ipynb for single-GPU GRPO training
directly in the workbench
- Include "Test the Trained Model" section with dynamic checkpoint loading
- Update README to document both interactive and distributed execution modes
- Update workbench requirements for interactive mode (8 CPU, 64Gi memory)
- Remove custom reward function appendix from both notebooks (out of scope)
Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy file name to clipboardExpand all lines: examples/fine-tuning/grpo/README.md
+16-9Lines changed: 16 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,11 +22,16 @@ The ART backend time-shares a single GPU between vLLM (inference) and Unsloth (t
22
22
23
23
The example uses the [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) dataset, which contains tool-calling conversations. The reward function verifies that the model produces syntactically correct tool calls with the expected function name and arguments.
24
24
25
-
## Execution mode
25
+
## Execution Modes
26
26
27
-
GRPO runs as a **single-GPU TrainJob** submitted via the Kubeflow SDK. ART is single-GPU by design and manages its own vLLM subprocess internally.
27
+
This example provides two notebooks:
28
28
29
-
The notebook submits a `TrainJob` from a lightweight workbench, and the training runs on a dedicated GPU pod managed by Kubeflow Trainer.
29
+
| Mode | Notebook | Description |
30
+
|------|----------|-------------|
31
+
|**Interactive**|[`grpo_lora-interactive-notebook.ipynb`](./grpo_lora-interactive-notebook.ipynb)| Runs GRPO training directly on the workbench GPU. Best for exploration, prototyping, and quick iteration. |
32
+
|**Distributed**|[`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb)| Submits a Kubeflow TrainJob from a lightweight workbench. Training runs on a dedicated GPU pod. Best for production workloads. |
33
+
34
+
ART is single-GPU by design and manages its own vLLM subprocess internally.
30
35
31
36
To learn more about execution modes for other algorithms, see the [fine-tuning execution modes overview](../README.md#execution-modes).
32
37
@@ -52,13 +57,13 @@ to seamlessly run fine-tuning jobs.
52
57
53
58
| Image Type | Use Case | GPU | CPU | Memory |
54
59
|------------|----------|-----|-----|--------|
55
-
| Training \| Jupyter \| PyTorch \| CPU Python |Job submission and monitoring | None | 2 cores | 8Gi |
56
-
| Training \| Jupyter \| PyTorch \| CUDA \| Python |Job submission + model evaluation | 1× GPU | 2 cores |8Gi|
60
+
| Training \| Jupyter \| PyTorch \| CPU Python |Distributed mode: job submission and monitoring | None | 2 cores | 8Gi |
61
+
| Training \| Jupyter \| PyTorch \| CUDA \| Python |Interactive mode, or distributed + model evaluation | 1× GPU (40GB+ VRAM) | 8 cores |64Gi|
57
62
58
63
> [!NOTE]
59
64
>
60
-
> - The workbench does not run the training itself — it submits a TrainJob and monitors progress.
61
-
> -A GPU on the workbench is only needed if you want to load and test the fine-tuned LoRA adapter after training completes.
65
+
> -**Distributed mode**: The workbench submits a TrainJob and monitors progress. A GPU on the workbench is only needed to test the fine-tuned LoRA adapter after training completes.
66
+
> -**Interactive mode**: Training runs directly on the workbench GPU. The workbench needs an A100, H100, or L40S (40GB+ VRAM) with sufficient CPU and memory.
62
67
63
68
### Training Pod Requirements
64
69
@@ -135,10 +140,12 @@ to seamlessly run fine-tuning jobs.
135
140
136
141

137
142
138
-
### Running the example notebook
143
+
### Running the example notebooks
139
144
140
145
- From the workbench, clone this repository: `https://github.com/red-hat-data-services/red-hat-ai-examples.git`
141
-
- Navigate to the `examples/fine-tuning/grpo` directory and open the [`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb) notebook.
146
+
- Navigate to the `examples/fine-tuning/grpo` directory and open the notebook for your preferred execution mode:
147
+
-**Interactive**: [`grpo_lora-interactive-notebook.ipynb`](./grpo_lora-interactive-notebook.ipynb) — runs training directly on the workbench GPU
148
+
-**Distributed**: [`grpo_lora-kubeflow-trainjob.ipynb`](./grpo_lora-kubeflow-trainjob.ipynb) — submits a TrainJob via Kubeflow Trainer
0 commit comments