Skip to content

Avoid blocking Ray registry lookups for policy loss resolution#1644

Open
taivu1998 wants to merge 1 commit into
NovaSky-AI:mainfrom
taivu1998:tdv/issue-1485-policy-loss-registry
Open

Avoid blocking Ray registry lookups for policy loss resolution#1644
taivu1998 wants to merge 1 commit into
NovaSky-AI:mainfrom
taivu1998:tdv/issue-1485-policy-loss-registry

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

Fixes #1485.

This PR removes policy-loss registry actor lookups from policy worker hot paths. Policy workers now receive a serialized policy-loss registry snapshot during construction and resolve both configured losses and request-time overrides from local process state.

Root Cause

PolicyWorkerBase and the Megatron request-time loss override path called PolicyLossRegistry.get(...) inside Ray worker actors. When Ray is initialized, that method synchronizes with the named registry actor using blocking ray.get(...), which triggers Ray's warning about using blocking calls inside async actors and can interfere with actor execution.

Changes

  • Add local-only registry lookup plus serialized snapshot/install helpers.
  • Snapshot the policy-loss registry on control paths before policy worker actors are created.
  • Pass the snapshot through PPO trainer, SFT trainer, and SkyRLTrainBackend policy actor construction.
  • Resolve FSDP and Megatron policy losses through worker-local lookup instead of actor-backed lookup.
  • Keep direct worker construction compatible by locally repopulating built-in policy losses without actor sync.
  • Add regression coverage for local lookup, snapshot round-trip behavior, worker-local resolution, and registry-test isolation.

Validation

  • uv run --isolated --extra skyrl-train --extra dev --with transformers==5.2.0 pytest tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.py
    • 44 passed, 1 warning
  • uv run --isolated --extra dev --with ruff ruff check skyrl/backends/skyrl_train/utils/ppo_utils.py skyrl/backends/skyrl_train/workers/worker.py skyrl/backends/skyrl_train/workers/megatron/megatron_model_wrapper.py skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py skyrl/backends/skyrl_train_backend.py skyrl/train/sft_trainer.py skyrl/train/trainer.py tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.py
    • passed
  • git diff --check
    • passed

The remaining warning is Ray's existing accelerator environment variable FutureWarning; it is not introduced by this change.

@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to snapshot and serialize the policy loss registry, enabling workers to resolve loss functions locally and avoid unnecessary synchronization with Ray actors. Key changes include the addition of get_local, snapshot_serialized, and install_serialized methods to the BaseFunctionRegistry, and updates to PolicyWorkerBase and MegatronModelWrapper to utilize these local lookups. Trainers now capture a registry snapshot during initialization and pass it to workers. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PolicyLossRegistry.get() causes "blocking ray.get inside async actor" log warnings

1 participant