Avoid blocking Ray registry lookups for policy loss resolution by taivu1998 · Pull Request #1644 · NovaSky-AI/SkyRL

taivu1998 · 2026-05-10T10:51:53Z

Summary

This PR removes policy-loss registry actor lookups from policy worker hot paths. Policy workers now receive a serialized policy-loss registry snapshot during construction and resolve both configured losses and request-time overrides from local process state.

Root Cause

PolicyWorkerBase and the Megatron request-time loss override path called PolicyLossRegistry.get(...) inside Ray worker actors. When Ray is initialized, that method synchronizes with the named registry actor using blocking ray.get(...), which triggers Ray's warning about using blocking calls inside async actors and can interfere with actor execution.

Changes

Add local-only registry lookup plus serialized snapshot/install helpers.
Snapshot the policy-loss registry on control paths before policy worker actors are created.
Pass the snapshot through PPO trainer, SFT trainer, and SkyRLTrainBackend policy actor construction.
Resolve FSDP and Megatron policy losses through worker-local lookup instead of actor-backed lookup.
Keep direct worker construction compatible by locally repopulating built-in policy losses without actor sync.
Add regression coverage for local lookup, snapshot round-trip behavior, worker-local resolution, and registry-test isolation.

Validation

uv run --isolated --extra skyrl-train --extra dev --with transformers==5.2.0 pytest tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.py
- 44 passed, 1 warning
uv run --isolated --extra dev --with ruff ruff check skyrl/backends/skyrl_train/utils/ppo_utils.py skyrl/backends/skyrl_train/workers/worker.py skyrl/backends/skyrl_train/workers/megatron/megatron_model_wrapper.py skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py skyrl/backends/skyrl_train_backend.py skyrl/train/sft_trainer.py skyrl/train/trainer.py tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.py
- passed
git diff --check
- passed

The remaining warning is Ray's existing accelerator environment variable FutureWarning; it is not introduced by this change.

gemini-code-assist

Code Review

This pull request introduces a mechanism to snapshot and serialize the policy loss registry, enabling workers to resolve loss functions locally and avoid unnecessary synchronization with Ray actors. Key changes include the addition of get_local, snapshot_serialized, and install_serialized methods to the BaseFunctionRegistry, and updates to PolicyWorkerBase and MegatronModelWrapper to utilize these local lookups. Trainers now capture a registry snapshot during initialization and pass it to workers. I have no feedback to provide.

Avoid Ray actor lookup for policy loss resolution

8ed4953

taivu1998 marked this pull request as ready for review May 11, 2026 03:11

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid blocking Ray registry lookups for policy loss resolution#1644

Avoid blocking Ray registry lookups for policy loss resolution#1644
taivu1998 wants to merge 1 commit into
NovaSky-AI:mainfrom
taivu1998:tdv/issue-1485-policy-loss-registry

taivu1998 commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taivu1998 commented May 10, 2026

Summary

Root Cause

Changes

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant