You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -283,6 +283,7 @@ Welcome to register your awesome project build with `verl` for other developers'
283
283
-[DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B 
284
284
-[NoisyRollout](https://github.com/NUS-TRAIL/NoisyRollout): Reinforcing Visual Reasoning with Data Augmentation 
285
285
-[SPEAR](https://github.com/TencentYoutuResearch/SPEAR): **Self-imitation** with **Progressive Exploration** for Agentic Reinforcement Learning (ICLR 2026) 
Copy file name to clipboardExpand all lines: docs/advance/ppo_lora.rst
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
RL(HF) algorithms with LoRA Support
2
2
===========================================
3
3
4
-
Last updated: 12/17/2025.
4
+
Last updated: 02/03/2026.
5
5
6
6
We support LoRA (Low-Rank Adaptation) for reinforcement learning algorithms such as PPO, GRPO, and others.
7
7
@@ -42,6 +42,8 @@ FSDP Backend Usage Guide
42
42
- `actor_rollout_ref.model.lora_adapter_path`: string, path to a pretrained LoRA adapter directory.
43
43
If provided, loads existing adapter instead of creating new one. Enables multi-stage training from previously saved adapters.
44
44
Directory need contain `adapter_model.safetensors` and `adapter_config.json`.
45
+
- `actor_rollout_ref.model.lora.merge`: bool, whether to merge LoRA adapters into the base model weights before transferring to vLLM.
46
+
If True, it will merge LoRA adapters into the base model weights before transferring to vLLM. If False, it will transfer only adapters to vLLM. This option is currently supported **only for engine-based rollout workers** (i.e. vLLM engine workers using the new worker implementation with ``trainer.use_legacy_worker_impl`` disabled) and is not available when using the legacy worker implementation.
45
47
46
48
5. Recommend options:
47
49
@@ -137,6 +139,10 @@ Make sure you use Megatron-Bridge later than 0.2.0, and we recommended using `th
137
139
# Path to pre-trained LoRA adapter weights (null to train from scratch)
138
140
adapter_path: null
139
141
142
+
# Whether to fully shard LoRA adapters. Defaults to False
0 commit comments