Skip to content

Commit 3e65e9d

Browse files
committed
doc trl integration
Signed-off-by: cmunley1 <cmunley@nvidia.com>
1 parent 8a9332c commit 3e65e9d

File tree

4 files changed

+299
-251
lines changed

4 files changed

+299
-251
lines changed

docs/contribute/rl-framework-integration/index.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,22 @@ These guides cover how to integrate NeMo Gym into a new RL training framework. U
88
- Contributing NeMo Gym integration for a training framework that does not have one yet
99

1010
:::{tip}
11-
Just want to train models? Use {ref}`NeMo RL <training-nemo-rl-grpo-index>` instead.
11+
Just want to train models? See existing integrations:
12+
- {ref}`NeMo RL <training-nemo-rl-grpo-index>` - Multi-step and multi-turn RL training at scale
13+
- {ref}`TRL (Hugging Face) <training-trl>` - GRPO training with distributed training support
14+
- {ref}`Unsloth <training-unsloth>` - Fast, memory-efficient training for single-step tasks
1215
:::
1316

17+
## Existing Integrations
18+
19+
NeMo Gym currently integrates with the following RL training frameworks:
20+
21+
**[NeMo RL](https://github.com/NVIDIA-NeMo/RL)**: NVIDIA's RL training framework, purpose-built for large-scale frontier model training. Provides full support for multi-step and multi-turn environments with production-grade distributed training capabilities.
22+
23+
**[TRL](https://github.com/huggingface/trl)**: Hugging Face's transformer reinforcement learning library. Supports GRPO with single and multi-turn NeMo Gym environments using vLLM generation, multi-environment training, and distributed training via Accelerate and DeepSpeed. See the {ref}`TRL tutorial <training-trl>` for usage examples.
24+
25+
**[Unsloth](https://github.com/unslothai/unsloth)**: Fast, memory-efficient fine-tuning library. Supports optimized GRPO with single and multi-turn NeMo Gym environments including low precision, parameter-efficient fine-tuning, and training in notebook environments. See the {ref}`Unsloth tutorial <training-unsloth>` for getting started.
26+
1427
## Prerequisites
1528

1629
Before integrating Gym into your training framework, ensure you have:

docs/index.md

Lines changed: 25 additions & 245 deletions
Original file line numberDiff line numberDiff line change
@@ -101,197 +101,51 @@ Detailed walkthrough of running your first training environment.
101101
:::{grid-item-card} {octicon}`iterations;1.5em;sd-mr-1` Rollout Collection
102102
:link: get-started/rollout-collection
103103
:link-type: doc
104-
Collect and view rollouts.
104+
Collect and view rollouts
105105
+++
106106
{bdg-secondary}`rollouts` {bdg-secondary}`training-data`
107107
:::
108108

109-
:::{grid-item-card} {octicon}`play;1.5em;sd-mr-1` First Training Run
110-
:link: get-started/first-training-run
111-
:link-type: doc
112-
Train your first model using collected rollouts.
113-
+++
114-
{bdg-secondary}`training` {bdg-secondary}`grpo`
115-
:::
116-
117-
::::
118-
119-
## Server Components
120-
121-
Configure and customize the three server components of a training environment.
122-
123-
::::{grid} 1 2 2 2
124-
:gutter: 1 1 1 2
125-
126-
:::{grid-item-card} {octicon}`cpu;1.5em;sd-mr-1` Model Server
127-
:link: model-server/index
128-
:link-type: doc
129-
Configure LLM inference backends: vLLM, OpenAI, Azure.
130-
+++
131-
{bdg-secondary}`inference` {bdg-secondary}`vllm` {bdg-secondary}`openai`
132-
:::
133-
134-
:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Resources Server
135-
:link: resources-server/index
136-
:link-type: doc
137-
Define tasks, tools, and verification logic.
138-
+++
139-
{bdg-secondary}`tools` {bdg-secondary}`verification`
140-
:::
141-
142-
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` Agent Server
143-
:link: agent-server/index
144-
:link-type: doc
145-
Orchestrate rollout lifecycle and tool calling.
146-
+++
147-
{bdg-secondary}`agents` {bdg-secondary}`orchestration`
148-
:::
149-
150-
:::{grid-item-card} {octicon}`database;1.5em;sd-mr-1` Data
151-
:link: data/index
152-
:link-type: doc
153-
Prepare and validate training datasets.
154-
+++
155-
{bdg-secondary}`datasets` {bdg-secondary}`jsonl`
156-
:::
157-
158109
::::
159110

160-
## Environment Tutorials
111+
<!-- This section needs to match the content in docs/tutorials/index.md -->
112+
## Tutorials
161113

162-
Learn how to build custom training environments for various RL scenarios.
114+
Hands-on tutorials to build and customize your training environments.
163115

164116
::::{grid} 1 2 2 2
165117
:gutter: 1 1 1 2
166118

167-
:::{grid-item-card} {octicon}`plus-circle;1.5em;sd-mr-1` Creating Environments
168-
:link: environment-tutorials/creating-training-environment
169-
:link-type: doc
170-
Build a complete training environment from scratch.
171-
+++
172-
{bdg-primary}`beginner` {bdg-secondary}`foundational`
173-
:::
174-
175-
:::{grid-item-card} {octicon}`iterations;1.5em;sd-mr-1` Multi-Step
176-
:link: environment-tutorials/multi-step
177-
:link-type: doc
178-
Sequential tool calling workflows.
179-
+++
180-
{bdg-secondary}`multi-step` {bdg-secondary}`tools`
181-
:::
182-
183-
:::{grid-item-card} {octicon}`comment-discussion;1.5em;sd-mr-1` Multi-Turn
184-
:link: environment-tutorials/multi-turn
119+
:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Build a Resource Server
120+
:link: tutorials/creating-resource-server
185121
:link-type: doc
186-
Conversational training environments.
122+
Implement or integrate existing tools and define task verification logic.
187123
+++
188-
{bdg-secondary}`multi-turn` {bdg-secondary}`dialogue`
124+
{bdg-primary}`beginner` {bdg-secondary}`30 min` {bdg-secondary}`custom-environments` {bdg-secondary}`tools`
189125
:::
190126

191-
:::{grid-item-card} {octicon}`law;1.5em;sd-mr-1` LLM-as-a-Judge
192-
:link: environment-tutorials/llm-as-judge
193-
:link-type: doc
194-
LLM-based response verification.
127+
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` Offline Training with Rollouts
128+
:link: offline-training-w-rollouts
129+
:link-type: ref
130+
Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <SFT (Supervised Fine-Tuning)>` and {term}`direct preference optimization (DPO) <DPO (Direct Preference Optimization)>`.
195131
+++
196-
{bdg-secondary}`verification` {bdg-secondary}`llm-judge`
132+
{bdg-secondary}`sft` {bdg-secondary}`dpo`
197133
:::
198134

199-
::::
200-
201-
```{button-ref} environment-tutorials/index
202-
:ref-type: doc
203-
:color: secondary
204-
:class: sd-rounded-pill
205-
206-
View all environment tutorials →
207-
```
208-
209-
## Training Tutorials
210-
211-
Train models using NeMo Gym with various RL frameworks.
212-
213-
::::{grid} 1 2 2 2
214-
:gutter: 1 1 1 2
215-
216-
:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` NeMo RL with GRPO
135+
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` GRPO with NeMo RL
217136
:link: training-nemo-rl-grpo-index
218137
:link-type: ref
219-
Multi-node GRPO training for production workloads.
138+
Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
220139
+++
221-
{bdg-primary}`recommended` {bdg-secondary}`grpo` {bdg-secondary}`multi-node`
140+
{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step`
222141
:::
223142

224143
:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth
225144
:link: training-unsloth
226145
:link-type: ref
227-
Fast, memory-efficient fine-tuning on single GPU.
228-
+++
229-
{bdg-secondary}`unsloth` {bdg-secondary}`efficient`
230-
:::
231-
232-
:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` TRL
233-
:link: training-tutorials/trl
234-
:link-type: doc
235-
HuggingFace TRL integration for PPO and DPO.
236-
+++
237-
{bdg-secondary}`trl` {bdg-secondary}`huggingface`
238-
:::
239-
240-
:::{grid-item-card} {octicon}`server;1.5em;sd-mr-1` VeRL
241-
:link: training-tutorials/verl
242-
:link-type: doc
243-
VeRL framework for research workflows.
244-
+++
245-
{bdg-secondary}`verl` {bdg-secondary}`research`
246-
:::
247-
248-
:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` NeMo Customizer
249-
:link: training-tutorials/nemo-customizer
250-
:link-type: doc
251-
Enterprise training with NeMo Customizer.
146+
Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more.
252147
+++
253-
{bdg-secondary}`nemo-customizer` {bdg-secondary}`enterprise`
254-
:::
255-
256-
:::{grid-item-card} {octicon}`file;1.5em;sd-mr-1` Offline Training
257-
:link: offline-training-w-rollouts
258-
:link-type: ref
259-
SFT and DPO from collected rollouts.
260-
+++
261-
{bdg-secondary}`sft` {bdg-secondary}`dpo`
262-
:::
263-
264-
::::
265-
266-
```{button-ref} training-tutorials/index
267-
:ref-type: doc
268-
:color: secondary
269-
:class: sd-rounded-pill
270-
271-
View all training tutorials →
272-
```
273-
274-
## Infrastructure
275-
276-
Deploy and scale NeMo Gym for production workloads.
277-
278-
::::{grid} 1 2 2 2
279-
:gutter: 1 1 1 2
280-
281-
:::{grid-item-card} {octicon}`server;1.5em;sd-mr-1` Deployment Topology
282-
:link: infrastructure/deployment-topology
283-
:link-type: doc
284-
Production deployment patterns and configurations.
285-
+++
286-
{bdg-secondary}`deployment` {bdg-secondary}`topology`
287-
:::
288-
289-
:::{grid-item-card} {octicon}`broadcast;1.5em;sd-mr-1` Distributed Computing with Ray
290-
:link: infrastructure/ray-distributed
291-
:link-type: doc
292-
Scale with Ray clusters for high-throughput rollout collection.
293-
+++
294-
{bdg-secondary}`ray` {bdg-secondary}`distributed`
148+
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step`
295149
:::
296150

297151
::::
@@ -335,8 +189,6 @@ Home <self>
335189
336190
Overview <about/index.md>
337191
Concepts <about/concepts/index>
338-
🟡 Architecture <about/architecture>
339-
🟡 Performance <about/performance>
340192
Ecosystem <about/ecosystem>
341193
```
342194

@@ -348,91 +200,19 @@ Ecosystem <about/ecosystem>
348200
Quickstart <get-started/index>
349201
Detailed Setup Guide <get-started/detailed-setup.md>
350202
Rollout Collection <get-started/rollout-collection.md>
351-
🟡 First Training Run <get-started/first-training-run.md>
352-
```
353-
354-
```{toctree}
355-
:caption: Model Server
356-
:hidden:
357-
:maxdepth: 1
358-
359-
🟡 Overview <model-server/index>
360-
🟡 vLLM <model-server/vllm>
361-
🟡 OpenAI <model-server/openai>
362-
🟡 Azure OpenAI <model-server/azure-openai>
363-
🟡 Responses API <model-server/responses-native>
364-
```
365-
366-
```{toctree}
367-
:caption: Resources Server
368-
:hidden:
369-
:maxdepth: 1
370-
371-
🟡 Overview <resources-server/index>
372-
🟡 Integrate Python Tools <resources-server/integrate-python-tools>
373-
🟡 Integrate APIs <resources-server/integrate-apis>
374-
🟡 Containerize <resources-server/containerize>
375-
🟡 Profile <resources-server/profile>
376-
```
377-
378-
```{toctree}
379-
:caption: Agent Server
380-
:hidden:
381-
:maxdepth: 1
382-
383-
🟡 Overview <agent-server/index>
384-
🟡 Integrate Agents <agent-server/integrate-agents/index>
385-
```
386-
387-
```{toctree}
388-
:caption: Data
389-
:hidden:
390-
:maxdepth: 1
391-
392-
🟡 Overview <data/index>
393-
🟡 Prepare and Validate <data/prepare-validate>
394-
🟡 Download from Hugging Face <data/download-huggingface>
395-
```
396-
397-
```{toctree}
398-
:caption: Environment Tutorials
399-
:hidden:
400-
:maxdepth: 1
401-
402-
🟡 Overview <environment-tutorials/index>
403-
🟡 Creating Training Environment <environment-tutorials/creating-training-environment>
404-
🟡 Multi-Step <environment-tutorials/multi-step>
405-
🟡 Multi-Turn <environment-tutorials/multi-turn>
406-
🟡 User Modeling <environment-tutorials/user-modeling>
407-
🟡 Multi-Node Docker <environment-tutorials/multi-node-docker>
408-
🟡 LLM as Judge <environment-tutorials/llm-as-judge>
409-
🟡 RLHF Reward Models <environment-tutorials/rlhf-reward-models>
410-
```
411-
412-
```{toctree}
413-
:caption: Training Tutorials
414-
:hidden:
415-
:maxdepth: 1
416-
417-
🟡 Overview <training-tutorials/index>
418-
🟡 Nemotron Nano <training-tutorials/nemotron-nano>
419-
🟡 Nemotron Super <training-tutorials/nemotron-super>
420-
NeMo RL GRPO <tutorials/nemo-rl-grpo/index.md>
421-
Unsloth Training <tutorials/unsloth-training>
422-
🟡 TRL <training-tutorials/trl>
423-
🟡 VERL <training-tutorials/verl>
424-
🟡 NeMo Customizer <training-tutorials/nemo-customizer>
425-
Offline Training <tutorials/offline-training-w-rollouts>
426203
```
427204

428205
```{toctree}
429-
:caption: Infrastructure
206+
:caption: Tutorials
430207
:hidden:
431208
:maxdepth: 1
432209
433-
🟡 Overview <infrastructure/index>
434-
🟡 Deployment Topology <infrastructure/deployment-topology>
435-
🟡 Ray Distributed <infrastructure/ray-distributed>
210+
tutorials/index.md
211+
tutorials/creating-resource-server
212+
tutorials/offline-training-w-rollouts
213+
tutorials/nemo-rl-grpo/index.md
214+
tutorials/trl-training
215+
tutorials/unsloth-training
436216
```
437217

438218
```{toctree}

docs/tutorials/index.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
---
2-
orphan: true
3-
---
4-
51
(tutorials-index)=
62

73
# NeMo Gym Tutorials
@@ -64,10 +60,18 @@ Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepa
6460
{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step`
6561
:::
6662

63+
:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` TRL (Hugging Face)
64+
:link: training-trl
65+
:link-type: ref
66+
Train models using Hugging Face TRL with GRPO in NeMo Gym environments. Supports multi-step tool calling, multi-environment and distributed training.
67+
+++
68+
{bdg-primary}`training` {bdg-secondary}`trl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step`
69+
:::
70+
6771
:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth
6872
:link: training-unsloth
6973
:link-type: ref
70-
Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more.
74+
Fast, memory-efficient GRPO in NeMo-Gym environments, including multi-step tool calling and multi-environment training.
7175
+++
7276
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step`
7377
:::

0 commit comments

Comments
 (0)