[trainer,ckpt,rollout] fix: wake up rollout replicas when actor update is skipped by critic warmup by lwzhenglittle · Pull Request #5590 · verl-project/verl

lwzhenglittle · 2026-03-14T08:27:01Z

When free_cache_engine=True, PPO sleeps rollout replicas after each rollout. Normally the actor update path calls checkpoint_manager.update_weights(), which also wakes the rollout engine back up for the next step.

However, when actor update is skipped because trainer.critic_warmup > global_steps, that wake-up never happens. As a result, the next rollout may start with the inference engine still in sleep state and trigger downstream vLLM runtime errors.

This change adds a lightweight checkpoint manager wake-up path and invokes it at the end of the PPO step only when actor update is skipped by critic warmup. For colocated and standalone replicas it directly calls replica.wake_up(); for hybrid replicas it falls back to update_weights(global_steps).

What does this PR do?

Fixes a PPO trainer bug on the critic_warmup skip-actor path.

When actor_rollout_ref.rollout.free_cache_engine=True, rollout replicas are put into sleep state after each rollout to release weights / KV cache. In the normal path, the step later enters actor update and calls checkpoint_manager.update_weights(), which also wakes the rollout engine back up.

However, if actor update is skipped because trainer.critic_warmup > global_steps, that update_weights() call is skipped as well. The rollout engine then remains asleep into the next step, which can surface as downstream vLLM / Ascend NPU runtime failures instead of a clear logical error.

This PR adds a lightweight rollout wake-up path and calls it only on the critic_warmup skip-actor branch.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/verl-project/verl/pulls?q=is%3Apr+free_cache_engine
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

I encounterd this bug on Ascend NPU, so i validated on Ascend NPU only.

I currently only have access to Ascend NPU hardware and do not have an NVIDIA GPU environment available, so I could not verify this fix on NVIDIA GPU locally.

Before this fix, the trainer will crash after step 1, logs: link, crashed on line 1051

After this fix, it can finish step 2: link, finished step 2 on line 914

API and Usage Example

No API or config change.

Design & Code Changes

Add checkpoint_manager.wake_up_replicas(global_steps=None) as a lightweight recovery path.
For colocated / standalone rollout replicas, directly call replica.wake_up().
For hybrid rollout replicas, fall back to trainer.update_weights(global_steps) semantics via checkpoint_manager.update_weights(global_steps).
In PPO training, when actor update is skipped because trainer.critic_warmup > global_steps
and free_cache_engine=True, call the lightweight wake-up path at the end of the step.
Keep the default behavior unchanged for the normal actor-update path.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation. No need for edit the docs.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. Might be a Ascend Only bug, so add ci only for Ascend E2E tests.
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main. Not related.

…e is skipped by critic warmup When free_cache_engine=True, PPO sleeps rollout replicas after each rollout. Normally the actor update path calls checkpoint_manager.update_weights(), which also wakes the rollout engine back up for the next step. However, when actor update is skipped because trainer.critic_warmup > global_steps, that wake-up never happens. As a result, the next rollout may start with the inference engine still in sleep state and trigger downstream vLLM runtime errors. This change adds a lightweight checkpoint manager wake-up path and invokes it at the end of the PPO step only when actor update is skipped by critic warmup. For colocated and standalone replicas it directly calls replica.wake_up(); for hybrid replicas it falls back to update_weights(global_steps).

CLAassistant · 2026-03-14T08:27:07Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request correctly fixes a bug where rollout replicas were not woken up when actor updates were skipped during critic warmup. The introduction of a dedicated wake_up_replicas method is a clean solution. The logic to handle hybrid replicas by falling back to a full weight update is sound, and the changes in the trainer correctly invoke this new path. The addition of a test case to cover this scenario is also a great inclusion. I have one suggestion to improve the robustness of the new API.

gemini-code-assist · 2026-03-14T08:28:52Z

verl/checkpoint_engine/base.py

+        if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):
+            await self.update_weights(global_steps=global_steps)
+            return
+        await self._direct_wake_up_replicas()


The global_steps parameter is optional (global_steps: int = None), which could lead to issues. If wake_up_replicas is called without global_steps for a manager with hybrid replicas, it will fall back to self.update_weights(global_steps=None). This could cause downstream errors if components like the learning rate scheduler expect an integer step count and do not handle None.

To improve robustness and make the API contract clearer, it's better to enforce that global_steps is provided when hybrid replicas are present.

Suggested change

if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):

await self.update_weights(global_steps=global_steps)

return

await self._direct_wake_up_replicas()

if any(replica.rollout_mode == RolloutMode.HYBRID for replica in self.replicas):

if global_steps is None:

raise ValueError("`global_steps` must be provided for hybrid replicas.")

await self.update_weights(global_steps=global_steps)

return

await self._direct_wake_up_replicas()

lwzhenglittle requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners March 14, 2026 08:27

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer,ckpt,rollout] fix: wake up rollout replicas when actor update is skipped by critic warmup#5590

[trainer,ckpt,rollout] fix: wake up rollout replicas when actor update is skipped by critic warmup#5590
lwzhenglittle wants to merge 1 commit intoverl-project:mainfrom
lwzhenglittle:fix-rollout-wake-up-without-update-actor

lwzhenglittle commented Mar 14, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lwzhenglittle commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lwzhenglittle commented Mar 14, 2026 •

edited

Loading

CLAassistant commented Mar 14, 2026 •

edited

Loading