-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation #4280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jianjunzhong
wants to merge
36
commits into
volcengine:main
Choose a base branch
from
jianjunzhong:refactor/vllm_sep_proc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation #4280
jianjunzhong
wants to merge
36
commits into
volcengine:main
from
jianjunzhong:refactor/vllm_sep_proc
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ss separation Signed-off-by: jianjunzhong <[email protected]>
51c8ad9 to
714a32f
Compare
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
ba4512b to
ca088a2
Compare
Signed-off-by: jianjunzhong <[email protected]>
7 tasks
ef46ad3 to
2d5b9f1
Compare
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
796366c to
6e2e0aa
Compare
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
1 task
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Open
7 tasks
Signed-off-by: jianjunzhong <[email protected]>
96c6d23 to
d427468
Compare
Signed-off-by: jianjunzhong <[email protected]>
9f9fa58 to
eb6fb52
Compare
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
7 tasks
Signed-off-by: jianjunzhong <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Refactor vLLM co-located training-inference rollout from single-process to multi-process architecture. This refactoring separates training and inference into different processes, enabling better resource isolation and paving the way for future checkpoint-engine integration (in roadmap #3624).
Key Changes:
vLLMAsyncRolloutintoServerAdapter- a client-side adapter that communicates with the inference executorExternalZeroMQDistributedExecutorwithvLLMMultiprocExecutor- a new multiproc executor that serves as the inference backendChecklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
This refactoring maintains full backward compatibility with existing vLLM rollout APIs. No changes are required to user code.
Key API Components:
ServerAdapter (replaces
vLLMAsyncRollout):vLLMAsyncRolloutclassvLLMMultiprocExecutor (replaces
ExternalZeroMQDistributedExecutor):MultiprocExecutorDesign & Code Changes
Architecture Overview
In the original
AsyncActorRolloutRefWorker, the training engine and inference engine shared the same process. The vLLM inference engine directly received weight updates through parameter passing.ExternalZeroMQDistributedExecutoracts as a client, sending instructions to allAsyncActorRolloutRefWorkerinference engines via ZMQ to execute operations likeinit_worker,load_model,init_device, andgenerate. Operations likewake_up,sleep, and weight updates were executed directly invLLMAsyncRolloutwithout going throughExternalZeroMQDistributedExecutor.Transform
vLLMAsyncRolloutintoServerAdapter, serving as a client for communicating with the Executor. Weight updates are based on CUDA IPC, passing through ZeroMQ to the inference engine.Deprecate the original
ExternalZeroMQDistributedExecutorclass and create a newvLLMMultiprocExecutorclass that inherits fromMultiprocExecutor. This acts as a server receiving instruction operations fromlocal_rank=0. All inference engine operations are uniformly broadcast to all inference workers throughvLLMMultiprocExecutor's RPC Broadcast MQ.Detailed Code Changes
verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py: Changes tovLLMAsyncRollout→ServerAdapterinference_engineattribute and related initialization (_init_worker,_load_model,_init_device)address,get_zeromq_address(),_init_zeromq(),_loop_forever(),_execute_method())set_executor_zmq_address(),_init_zmq_client(),execute_method())resume(),release(),update_weights()to send messages via executorget_update_weights_zmq_handle(),set_update_weights_zmq_handles())verl/workers/rollout/vllm_rollout/vllm_multiproc_executor.py: New file - Core ComponentsvLLMWorkerProc: Extends vLLM'sWorkerProcwith custom initialization__init__method to adapt to verl's initialization requirements:VERL_VLLM_FP8_QUANT_ENABLEDmake_worker_processstatic method (modified from vLLM's implementation)worker_mainstatic method to run worker initialization and execution loopsvLLMMultiprocExecutor: Extends vLLM'sMultiprocExecutorverl/workers/fsdp_workers.py: Changes toAsyncActorRolloutRefWorkerget_zeromq_address()method (no longer needed)set_executor_zmq_address()- sets ZMQ address for executor communicationset_update_weights_zmq_handles()- configures IPC handles for weight updatesget_update_weights_zmq_handle()- retrieves handle for weight synchronizationverl/workers/rollout/vllm_rollout/utils.py: New class -vLLMColocateWorkerExtension--worker_extension_clsparameterverl/workers/rollout/vllm_rollout/vllm_async_server.py: Changes tovLLMHttpServerBase.launch_server()vLLMMultiprocExecutorinstead ofExternalZeroMQDistributedExecutor--worker_extension_clsparameter to passvLLMColocateWorkerExtensionVERL_VLLM_EXECUTOR_ZMQ_ADDRESSenvironment variableVERL_VLLM_ZMQ_ADDRESSESenvironment variable (no longer needed)Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)