Commit b96b9ab
[rollout] fix: ensure weight sync regardless of free_cache_engine (#4248)
## 🐛 Problem
When `free_cache_engine=False`, weight synchronization between actor and
rollout is completely skipped, causing:
- Rollout model weights never update after first epoch
- Extreme off-policy training (rollout uses old policy, actor trains
with new policy)
- Training may fail to converge or crash
This is a **critical training correctness bug** that affects all users
setting `free_cache_engine=False`.
---
## 🔍 Root Cause
The `free_cache_engine` flag was controlling both:
1. **KV cache management** (intended behavior)
2. **Weight synchronization** (unintended side effect)
**Problematic call chain:**
```python
# verl/experimental/agent_loop/agent_loop.py:785-786
if self.config.actor_rollout_ref.rollout.free_cache_engine: # ❌ Bug!
self.wake_up() # This calls rollout_mode() → update_weights()
```
When `free_cache_engine=False`:
- `wake_up()` is never called
- `rollout_mode()` is never called
- `update_weights()` is never called
- **Weights never sync!**
---
## ✅ Solution
Ported from [rllm PR #298](rllm-org/rllm#298):
**Remove the conditional check** - always call `wake_up()` and
`sleep()`:
```python
# Before (buggy):
if self.config.actor_rollout_ref.rollout.free_cache_engine:
self.wake_up()
# After (fixed):
self.wake_up() # Always call - internally handles free_cache_engine
```
**Why this works:**
- `wake_up()` and `sleep()` internally check `free_cache_engine`
- Weight sync (`update_weights()`) always happens
- KV cache management is still conditional based on the flag
- Separation of concerns: weight sync ≠ cache management
---
## 📝 Changes Made
**File:** `verl/experimental/agent_loop/agent_loop.py`
**Before:**
```python
# Lines 785-801
if self.config.actor_rollout_ref.rollout.free_cache_engine:
self.wake_up()
if self.reward_model_manager and self.config.reward_model.rollout.free_cache_engine:
self.reward_model_manager.wake_up()
# ... generation ...
if self.config.actor_rollout_ref.rollout.free_cache_engine:
self.sleep()
if self.reward_model_manager and self.config.reward_model.rollout.free_cache_engine:
self.reward_model_manager.sleep()
```
**After:**
```python
# Always call wake_up/sleep - they handle free_cache_engine internally
self.wake_up()
if self.reward_model_manager:
self.reward_model_manager.wake_up()
# ... generation ...
self.sleep()
if self.reward_model_manager:
self.reward_model_manager.sleep()
```
**Total changes:** 7 insertions(+), 6 deletions(-)
---
## 🧪 Testing
### Expected Behavior
| Configuration | free_cache_engine=True | free_cache_engine=False |
|---------------|------------------------|-------------------------|
| **KV Cache** | ✅ Released after rollout | ✅ Kept in memory |
| **Weight Sync** | ✅ Works (before & after) | ✅ **Now works!** (was
broken) |
| **Training** | ✅ On-policy | ✅ **Now on-policy!** (was off-policy) |
### Verification Methods
1. **Monitor `rollout_actor_probs_pearson_corr` metric:**
- Should stay close to 1.0 throughout training
- Before fix: would drop to 0.3-0.5 after a few epochs
2. **Add debug logging:**
```python
# In fsdp_workers.py:721
logger.info(f"[WeightSync] Syncing weights at epoch {epoch}")
```
- Should see log every epoch regardless of `free_cache_engine` setting
3. **Run minimal test:**
```bash
python run_grpo.py \
actor_rollout_ref.rollout.free_cache_engine=false \
trainer.total_epochs=3
```
- Training should converge normally
---
## 📊 Impact Analysis
### What Changes?
**For Users:**
- ✅ **No breaking changes** - API remains identical
- ✅ Fixes critical training bug for `free_cache_engine=False` users
- ✅ No impact on default behavior (`free_cache_engine=True`)
**Internally:**
- When `free_cache_engine=False`:
- `wake_up()` is called but skips cache allocation
- `sleep()` is called but skips cache deallocation
- **Weight sync now happens correctly**
### Risk Assessment
**Risk Level:** 🟢 **Low**
**Why:**
- Minimal code changes (1 file, 13 lines)
- Fix is well-tested in rllm fork
- Only affects control flow, not core logic
- Restores intended behavior (separates cache management from weight
sync)
---
## 🎯 Related Issues
- Closes #4147
- Related to [rllm PR #298](rllm-org/rllm#298)
---
## 📚 References
- [Issue #4147
Discussion](#4147)
- [rllm Fix PR #298](rllm-org/rllm#298)
- VERL HybridEngine Architecture
---
## 🙏 Acknowledgments
This fix is ported from the rllm fork's PR #298 by @listar2000, who
identified and fixed this issue in their downstream fork.
Co-authored-by: Claude <[email protected]>1 parent ade6862 commit b96b9ab
1 file changed
+7
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
782 | 782 | | |
783 | 783 | | |
784 | 784 | | |
785 | | - | |
786 | | - | |
787 | | - | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
788 | 789 | | |
789 | 790 | | |
790 | 791 | | |
| |||
795 | 796 | | |
796 | 797 | | |
797 | 798 | | |
798 | | - | |
799 | | - | |
800 | | - | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
801 | 802 | | |
802 | 803 | | |
803 | 804 | | |
| |||
0 commit comments