Skip to content

Conversation

@jianjunzhong
Copy link
Contributor

@jianjunzhong jianjunzhong commented Nov 25, 2025

What does this PR do?

Refactor vLLM co-located training-inference rollout from single-process to multi-process architecture. This refactoring separates training and inference into different processes, enabling better resource isolation and paving the way for future checkpoint-engine integration (in roadmap #3624).

Key Changes:

  • Transform vLLMAsyncRollout into ServerAdapter - a client-side adapter that communicates with the inference executor
  • Replace ExternalZeroMQDistributedExecutor with vLLMMultiprocExecutor - a new multiproc executor that serves as the inference backend
  • Implement CUDA IPC-based weight updates via ZeroMQ for efficient parameter synchronization between training and inference processes

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

This refactoring maintains full backward compatibility with existing vLLM rollout APIs. No changes are required to user code.

Key API Components:

  1. ServerAdapter (replaces vLLMAsyncRollout):

    • Acts as client-side adapter for communicating with inference executor
    • Manages CUDA IPC-based weight updates
    • Provides same interface as previous vLLMAsyncRollout class
  2. vLLMMultiprocExecutor (replaces ExternalZeroMQDistributedExecutor):

    • Inherits from vLLM's MultiprocExecutor
    • Handles RPC communication with training workers
    • Manages inference worker processes

Design & Code Changes

Architecture Overview

  1. Before (Single-Process Architecture)
  • Single-Process Design

In the original AsyncActorRolloutRefWorker, the training engine and inference engine shared the same process. The vLLM inference engine directly received weight updates through parameter passing.

single

  • Communication Architecture

ExternalZeroMQDistributedExecutor acts as a client, sending instructions to all AsyncActorRolloutRefWorker inference engines via ZMQ to execute operations like init_worker, load_model, init_device, and generate. Operations like wake_up, sleep, and weight updates were executed directly in vLLMAsyncRollout without going through ExternalZeroMQDistributedExecutor.

single_comm

  1. After (Multi-Process Architecture):
  • Multi-Process Design

Transform vLLMAsyncRollout into ServerAdapter, serving as a client for communicating with the Executor. Weight updates are based on CUDA IPC, passing through ZeroMQ to the inference engine.

multi

  • Communication Architecture

Deprecate the original ExternalZeroMQDistributedExecutor class and create a new vLLMMultiprocExecutor class that inherits from MultiprocExecutor. This acts as a server receiving instruction operations from local_rank=0. All inference engine operations are uniformly broadcast to all inference workers through vLLMMultiprocExecutor's RPC Broadcast MQ.

multi_comm

Detailed Code Changes

  1. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py: Changes to vLLMAsyncRolloutServerAdapter
  • Removed: inference_engine attribute and related initialization (_init_worker, _load_model, _init_device)
  • Removed: ZMQ server functionality (address, get_zeromq_address(), _init_zeromq(), _loop_forever(), _execute_method())
  • Added: ZMQ client functionality (set_executor_zmq_address(), _init_zmq_client(), execute_method())
  • Modified: resume(), release(), update_weights() to send messages via executor
  • Added: CUDA IPC handle management (get_update_weights_zmq_handle(), set_update_weights_zmq_handles())
  1. verl/workers/rollout/vllm_rollout/vllm_multiproc_executor.py: New file - Core Components
  • vLLMWorkerProc: Extends vLLM's WorkerProc with custom initialization

    • Rewrites __init__ method to adapt to verl's initialization requirements:
      • Applies FP8 quantization patches if enabled via VERL_VLLM_FP8_QUANT_ENABLED
      • Applies vocabulary size monkey patch for logits computation
    • Rewrites make_worker_process static method (modified from vLLM's implementation)
    • Rewrites worker_main static method to run worker initialization and execution loops
    • Handles graceful shutdown with death monitoring for parent process detection
  • vLLMMultiprocExecutor: Extends vLLM's MultiprocExecutor

    • Inherits multiproc execution capabilities from vLLM
    • Adds ZMQ communication with training workers
    • Broadcasts RPC commands to all inference workers
    • Manages lifecycle of inference worker processes

Note: Once vLLM updates make_worker_process and worker_main to class methods of WorkerProc, these 2 overrides will be removed

  1. verl/workers/fsdp_workers.py: Changes to AsyncActorRolloutRefWorker
  • Removed: get_zeromq_address() method (no longer needed)
  • Added: set_executor_zmq_address() - sets ZMQ address for executor communication
  • Added: set_update_weights_zmq_handles() - configures IPC handles for weight updates
  • Added: get_update_weights_zmq_handle() - retrieves handle for weight synchronization
  1. verl/workers/rollout/vllm_rollout/utils.py: New class - vLLMColocateWorkerExtension
  • Worker extension class for vLLM instances
  • Integrates via --worker_extension_cls parameter
  • Enables CUDA IPC-based weight update mechanism
  • Based on vLLM PR #24295 implementation
  1. verl/workers/rollout/vllm_rollout/vllm_async_server.py: Changes to vLLMHttpServerBase.launch_server()
  • Modified to use vLLMMultiprocExecutor instead of ExternalZeroMQDistributedExecutor
  • Added --worker_extension_cls parameter to pass vLLMColocateWorkerExtension
  • Generates and sets VERL_VLLM_EXECUTOR_ZMQ_ADDRESS environment variable
  • Distributes executor ZMQ address to all training workers
  • Retrieves and configures update weights ZMQ handles
  • Removed: VERL_VLLM_ZMQ_ADDRESSES environment variable (no longer needed)

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch from 51c8ad9 to 714a32f Compare November 27, 2025 14:59
@jianjunzhong jianjunzhong changed the title [BREAKING][worker, rollout, vllm] feat: implement vLLM co-located training-inference rollout with process separation [WIP][BREAKING][worker, rollout, vllm] feat: implement vLLM co-located training-inference rollout with process separation Nov 28, 2025
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch from ba4512b to ca088a2 Compare December 7, 2025 14:44
@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch 3 times, most recently from ef46ad3 to 2d5b9f1 Compare December 11, 2025 01:45
@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch from 796366c to 6e2e0aa Compare December 11, 2025 07:06
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
Signed-off-by: jianjunzhong <[email protected]>
@jianjunzhong jianjunzhong marked this pull request as ready for review December 16, 2025 07:02
@jianjunzhong jianjunzhong changed the title [WIP][BREAKING][worker, rollout, vllm] feat: implement vLLM co-located training-inference rollout with process separation [BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation Dec 16, 2025
@jianjunzhong jianjunzhong marked this pull request as draft December 16, 2025 14:08
Signed-off-by: jianjunzhong <[email protected]>
@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch from 96c6d23 to d427468 Compare December 23, 2025 09:25
Signed-off-by: jianjunzhong <[email protected]>
@jianjunzhong jianjunzhong force-pushed the refactor/vllm_sep_proc branch from 9f9fa58 to eb6fb52 Compare December 24, 2025 13:22
Signed-off-by: jianjunzhong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant