Avoid blocking Ray registry lookups for policy loss resolution#1644
Open
taivu1998 wants to merge 1 commit into
Open
Avoid blocking Ray registry lookups for policy loss resolution#1644taivu1998 wants to merge 1 commit into
taivu1998 wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to snapshot and serialize the policy loss registry, enabling workers to resolve loss functions locally and avoid unnecessary synchronization with Ray actors. Key changes include the addition of get_local, snapshot_serialized, and install_serialized methods to the BaseFunctionRegistry, and updates to PolicyWorkerBase and MegatronModelWrapper to utilize these local lookups. Trainers now capture a registry snapshot during initialization and pass it to workers. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1485.
This PR removes policy-loss registry actor lookups from policy worker hot paths. Policy workers now receive a serialized policy-loss registry snapshot during construction and resolve both configured losses and request-time overrides from local process state.
Root Cause
PolicyWorkerBaseand the Megatron request-time loss override path calledPolicyLossRegistry.get(...)inside Ray worker actors. When Ray is initialized, that method synchronizes with the named registry actor using blockingray.get(...), which triggers Ray's warning about using blocking calls inside async actors and can interfere with actor execution.Changes
SkyRLTrainBackendpolicy actor construction.Validation
uv run --isolated --extra skyrl-train --extra dev --with transformers==5.2.0 pytest tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.py44 passed, 1 warninguv run --isolated --extra dev --with ruff ruff check skyrl/backends/skyrl_train/utils/ppo_utils.py skyrl/backends/skyrl_train/workers/worker.py skyrl/backends/skyrl_train/workers/megatron/megatron_model_wrapper.py skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py skyrl/backends/skyrl_train_backend.py skyrl/train/sft_trainer.py skyrl/train/trainer.py tests/backends/skyrl_train/utils/test_ppo_utils.py tests/train/test_trainer.pygit diff --checkThe remaining warning is Ray's existing accelerator environment variable
FutureWarning; it is not introduced by this change.