-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Description
In recipe/fully_async_policy/fully_async_rollouter.py, there is an inconsistency in how total_epochs is referenced:
- Line ~85:
total_rollout_stepscalculation usesself.config.trainer.total_epochs - Line ~400+: The epoch loop uses
self.config.rollout.total_epochs
This causes configuration confusion when users set rollout.total_epochs but the actual rollout step limit is calculated from trainer.total_epochs.
Code Location
File: recipe/fully_async_policy/fully_async_rollouter.py
# Line ~85: Uses trainer.total_epochs
self.total_rollout_steps = len(self.train_dataloader) * self.config.trainer.total_epochs
if self.config.rollout.total_rollout_steps is not None:
self.total_rollout_steps = min(self.config.rollout.total_rollout_steps, self.total_rollout_steps)
# Line ~400+: Uses rollout.total_epochs
for epoch in range(self.config.rollout.total_epochs):Reproduction
- Configure rollout.total_epochs=50 in the experiment script
- Leave trainer.total_epochs at default (30)
- Run fully async training
Expected: Experiment runs for 50 epochs
Actual: Experiment terminates after ~30 epochs worth of samples
Evidence
From experiment logs:
'total_rollout_steps': 3200 # Config value (intended)
[FullyAsyncRollouter] Total rollout steps: 1920 # Actual value used!
Calculation:
- len(train_dataloader) = 64
- trainer.total_epochs = 30 (default, not overridden)
- Calculated: 64 * 30 = 1920
- min(3200, 1920) = 1920 ← This is less than intended
Suggested Fix
# Change from:
self.total_rollout_steps = len(self.train_dataloader) * self.config.trainer.total_epochs
# To:
self.total_rollout_steps = len(self.train_dataloader) * self.config.rollout.total_epochsThis ensures consistency with the epoch loop and respects the user's rollout.total_epochs configuration.
Environment
- verl version: latest main branch
- Config file: recipe/fully_async_policy/config/fully_async_ppo_trainer.yaml
Related
The config file fully_async_ppo_trainer.yaml inherits from ppo_trainer.yaml which defines trainer.total_epochs: 30, while also defining rollout.total_epochs: 10. These
two separate configs can easily cause confusion.