Skip to content

[trainer,ckpt,rollout] fix: wake up rollout replicas when actor update is skipped by critic warmup#5590

Open
lwzhenglittle wants to merge 1 commit intoverl-project:mainfrom
lwzhenglittle:fix-rollout-wake-up-without-update-actor
Open

[trainer,ckpt,rollout] fix: wake up rollout replicas when actor update is skipped by critic warmup#5590
lwzhenglittle wants to merge 1 commit intoverl-project:mainfrom
lwzhenglittle:fix-rollout-wake-up-without-update-actor

Conversation

@lwzhenglittle
Copy link

@lwzhenglittle lwzhenglittle commented Mar 14, 2026

When free_cache_engine=True, PPO sleeps rollout replicas after each rollout. Normally the actor update path calls checkpoint_manager.update_weights(), which also wakes the rollout engine back up for the next step.

However, when actor update is skipped because trainer.critic_warmup > global_steps, that wake-up never happens. As a result, the next rollout may start with the inference engine still in sleep state and trigger downstream vLLM runtime errors.

This change adds a lightweight checkpoint manager wake-up path and invokes it at the end of the PPO step only when actor update is skipped by critic warmup. For colocated and standalone replicas it directly calls replica.wake_up(); for hybrid replicas it falls back to update_weights(global_steps).

What does this PR do?

Fixes a PPO trainer bug on the critic_warmup skip-actor path.

When actor_rollout_ref.rollout.free_cache_engine=True, rollout replicas are put into sleep state after each rollout to release weights / KV cache. In the normal path, the step later enters actor update and calls checkpoint_manager.update_weights(), which also wakes the rollout engine back up.

However, if actor update is skipped because trainer.critic_warmup > global_steps, that update_weights() call is skipped as well. The rollout engine then remains asleep into the next step, which can surface as downstream vLLM / Ascend NPU runtime failures instead of a clear logical error.

This PR adds a lightweight rollout wake-up path and calls it only on the critic_warmup skip-actor branch.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: https://github.com/verl-project/verl/pulls?q=is%3Apr+free_cache_engine
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

I encounterd this bug on Ascend NPU, so i validated on Ascend NPU only.

I currently only have access to Ascend NPU hardware and do not have an NVIDIA GPU environment available, so I could not verify this fix on NVIDIA GPU locally.

Before this fix, the trainer will crash after step 1, logs: link, crashed on line 1051

After this fix, it can finish step 2: link, finished step 2 on line 914

API and Usage Example

No API or config change.

Design & Code Changes

  • Add checkpoint_manager.wake_up_replicas(global_steps=None) as a lightweight recovery path.
  • For colocated / standalone rollout replicas, directly call replica.wake_up().
  • For hybrid rollout replicas, fall back to trainer.update_weights(global_steps) semantics via checkpoint_manager.update_weights(global_steps).
  • In PPO training, when actor update is skipped because trainer.critic_warmup > global_steps
    and free_cache_engine=True, call the lightweight wake-up path at the end of the step.
  • Keep the default behavior unchanged for the normal actor-update path.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

…e is skipped by critic warmup

When free_cache_engine=True, PPO sleeps rollout replicas after each rollout.
Normally the actor update path calls checkpoint_manager.update_weights(),
which also wakes the rollout engine back up for the next step.

However, when actor update is skipped because trainer.critic_warmup > global_steps,
that wake-up never happens. As a result, the next rollout may start with the
inference engine still in sleep state and trigger downstream vLLM runtime errors.

This change adds a lightweight checkpoint manager wake-up path and invokes it
at the end of the PPO step only when actor update is skipped by critic warmup.
For colocated and standalone replicas it directly calls replica.wake_up();
for hybrid replicas it falls back to update_weights(global_steps).
@CLAassistant
Copy link

CLAassistant commented Mar 14, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where rollout replicas were not woken up when actor updates were skipped during critic warmup. The introduction of a dedicated wake_up_replicas method is a clean solution. The logic to handle hybrid replicas by falling back to a full weight update is sound, and the changes in the trainer correctly invoke this new path. The addition of a test case to cover this scenario is also a great inclusion. I have one suggestion to improve the robustness of the new API.

Comment on lines +415 to +418
if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):
await self.update_weights(global_steps=global_steps)
return
await self._direct_wake_up_replicas()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The global_steps parameter is optional (global_steps: int = None), which could lead to issues. If wake_up_replicas is called without global_steps for a manager with hybrid replicas, it will fall back to self.update_weights(global_steps=None). This could cause downstream errors if components like the learning rate scheduler expect an integer step count and do not handle None.

To improve robustness and make the API contract clearer, it's better to enforce that global_steps is provided when hybrid replicas are present.

Suggested change
if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):
await self.update_weights(global_steps=global_steps)
return
await self._direct_wake_up_replicas()
if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):
if global_steps is None:
raise ValueError("`global_steps` must be provided for hybrid replicas.")
await self.update_weights(global_steps=global_steps)
return
await self._direct_wake_up_replicas()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants