You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/contribute/rl-framework-integration/index.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,22 @@ These guides cover how to integrate NeMo Gym into a new RL training framework. U
8
8
- Contributing NeMo Gym integration for a training framework that does not have one yet
9
9
10
10
:::{tip}
11
-
Just want to train models? Use {ref}`NeMo RL <training-nemo-rl-grpo-index>` instead.
11
+
Just want to train models? See existing integrations:
12
+
- {ref}`NeMo RL <training-nemo-rl-grpo-index>` - Multi-step and multi-turn RL training at scale
13
+
- {ref}`TRL (Hugging Face) <training-trl>` - GRPO training with distributed training support
14
+
- {ref}`Unsloth <training-unsloth>` - Fast, memory-efficient training for single-step tasks
12
15
:::
13
16
17
+
## Existing Integrations
18
+
19
+
NeMo Gym currently integrates with the following RL training frameworks:
20
+
21
+
**[NeMo RL](https://github.com/NVIDIA-NeMo/RL)**: NVIDIA's RL training framework, purpose-built for large-scale frontier model training. Provides full support for multi-step and multi-turn environments with production-grade distributed training capabilities.
22
+
23
+
**[TRL](https://github.com/huggingface/trl)**: Hugging Face's transformer reinforcement learning library. Supports GRPO with single and multi-turn NeMo Gym environments using vLLM generation, multi-environment training, and distributed training via Accelerate and DeepSpeed. See the {ref}`TRL tutorial <training-trl>` for usage examples.
24
+
25
+
**[Unsloth](https://github.com/unslothai/unsloth)**: Fast, memory-efficient fine-tuning library. Supports optimized GRPO with single and multi-turn NeMo Gym environments including low precision, parameter-efficient fine-tuning, and training in notebook environments. See the {ref}`Unsloth tutorial <training-unsloth>` for getting started.
26
+
14
27
## Prerequisites
15
28
16
29
Before integrating Gym into your training framework, ensure you have:
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1`Offline Training with Rollouts
128
+
:link:offline-training-w-rollouts
129
+
:link-type: ref
130
+
Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <SFT (Supervised Fine-Tuning)>` and {term}`direct preference optimization (DPO) <DPO (Direct Preference Optimization)>`.
0 commit comments