Skip to content

recipe/fully_async Bug: Inconsistent total_epochs usage causes early experiment termination #4735

@FarrenZhang

Description

@FarrenZhang

Description

In recipe/fully_async_policy/fully_async_rollouter.py, there is an inconsistency in how total_epochs is referenced:

  1. Line ~85: total_rollout_steps calculation uses self.config.trainer.total_epochs
  2. Line ~400+: The epoch loop uses self.config.rollout.total_epochs

This causes configuration confusion when users set rollout.total_epochs but the actual rollout step limit is calculated from trainer.total_epochs.

Code Location

File: recipe/fully_async_policy/fully_async_rollouter.py

# Line ~85: Uses trainer.total_epochs
self.total_rollout_steps = len(self.train_dataloader) * self.config.trainer.total_epochs
if self.config.rollout.total_rollout_steps is not None:
    self.total_rollout_steps = min(self.config.rollout.total_rollout_steps, self.total_rollout_steps)

# Line ~400+: Uses rollout.total_epochs
for epoch in range(self.config.rollout.total_epochs):

Reproduction

  1. Configure rollout.total_epochs=50 in the experiment script
  2. Leave trainer.total_epochs at default (30)
  3. Run fully async training

Expected: Experiment runs for 50 epochs
Actual: Experiment terminates after ~30 epochs worth of samples

Evidence

From experiment logs:
'total_rollout_steps': 3200 # Config value (intended)
[FullyAsyncRollouter] Total rollout steps: 1920 # Actual value used!

Calculation:

  • len(train_dataloader) = 64
  • trainer.total_epochs = 30 (default, not overridden)
  • Calculated: 64 * 30 = 1920
  • min(3200, 1920) = 1920 ← This is less than intended

Suggested Fix

# Change from:
self.total_rollout_steps = len(self.train_dataloader) * self.config.trainer.total_epochs

# To:
self.total_rollout_steps = len(self.train_dataloader) * self.config.rollout.total_epochs

This ensures consistency with the epoch loop and respects the user's rollout.total_epochs configuration.

Environment

  • verl version: latest main branch
  • Config file: recipe/fully_async_policy/config/fully_async_ppo_trainer.yaml

Related

The config file fully_async_ppo_trainer.yaml inherits from ppo_trainer.yaml which defines trainer.total_epochs: 30, while also defining rollout.total_epochs: 10. These
two separate configs can easily cause confusion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions