Skip to content

[Feature] Support Stage Based Deployment CLI#939

Open
wuhang2014 wants to merge 9 commits intovllm-project:mainfrom
wuhang2014:stagecli
Open

[Feature] Support Stage Based Deployment CLI#939
wuhang2014 wants to merge 9 commits intovllm-project:mainfrom
wuhang2014:stagecli

Conversation

@wuhang2014
Copy link
Contributor

@wuhang2014 wuhang2014 commented Jan 25, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Background is described in #870.

For now, only support single node, multiprocessing:

  • Multiple node is not supported;
  • Ray backend is not supported;
  • DP for diffusion model is not supported;

Test Plan

model: Qwen3-Omni

deployment CLI:

  • stage-0
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 0 --data-parallel-size 2
  • stage-1
CUDA_VISIBLE_DEVICES=2 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 1 --headless
  • stage-2
CUDA_VISIBLE_DEVICES=3 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 2 --headless

test script:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
          {
            "role": "user",
            "content": [
              { "type": "text", "text": "What’s in this image?" },
              {
                "type": "image_url",
                "image_url": {
                  "url": "file:///data/wuhang/dog-4988985_960_720.jpg"
                }
              }
            ]
          }
    ],
    "audio": { "voice": "alloy", "format": "wav" }
  }'

Test Result

(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --model /data/models/Qwen3-Omni-30B-A3B-Instruct/ --image-path /data/wuhang/dog-4988985_960_720.jpg 
Chat completion output from text: Based on the image provided, here is a detailed description of its content:

This is a professionally taken, close-up photograph of a happy dog lying in a field of green grass.

*   **Main Subject:** The central focus is a Pembroke Welsh Corgi. It has a classic tan and white coat, with tan fur covering its head, ears, and back, and white fur on its chest, neck, and muzzle.
*   **Expression and Pose:** The corgi is lying down but looking directly at the camera with an alert and joyful expression. Its mouth is open in what appears to be a smile, with its pink tongue slightly visible. Its large, erect ears are pointed forward, indicating it is attentive.
*   **Setting and Lighting:** The dog is in a lush, sunlit grassy area. The lighting suggests it's either early morning or late afternoon (golden hour), casting a warm, soft glow over the scene. The background is softly blurred (a shallow depth of field), showing out-of-focus trees and foliage, which helps to emphasize the dog as the main subject.
*   **Details:** The corgi is wearing a dark green collar around its neck.
Audio saved to audio_0.wav
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# ls -l
total 2920
-rw-r--r-- 1 root root 2918954 Jan 26 08:57 audio_0.wav
-rw-r--r-- 1 root root   19876 Jan 22 12:00 gradio_demo.py
-rw-r--r-- 1 root root   16995 Jan 25 11:14 openai_chat_completion_client_for_multimodal_generation.py
-rw-r--r-- 1 root root    1177 Jan 22 12:00 qwen3_omni_moe_thinking.yaml
-rw-r--r-- 1 root root    7166 Jan 22 12:00 README.md
-rw-r--r-- 1 root root    4359 Jan 22 12:00 run_curl_multimodal_generation.sh
-rwxr-xr-x 1 root root    6123 Jan 22 12:00 run_gradio_demo.sh
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# 

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Silent error handling - Multiple except Exception: pass blocks

    • Fix: Add logging: except Exception as e: logger.debug(f"Error: {e}")
  2. Log spam - logger.info() in hot paths (line 1466)

    • Fix: Change to logger.debug()
  3. PR description incomplete - "Test Result" section is empty

    • Fix: Add actual test output, performance metrics

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements stage-based deployment CLI support for vLLM-Omni, enabling independent deployment of pipeline stages across processes using ZMQ-based IPC. This is part of the larger effort described in issue #870 to support data parallelism for pipeline stages.

Changes:

  • Added ZMQ-based queue utilities to replace multiprocessing queues for inter-stage communication
  • Implemented headless mode for deploying individual stages independently
  • Added dynamic port allocation and handshake protocol for stage coordination

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 47 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/zmq_utils.py New file providing ZMQ queue wrapper and handshake utilities for stage communication
vllm_omni/entrypoints/omni_stage.py Modified to support both ZMQ and multiprocessing queues, added cleanup handlers and queue spec support
vllm_omni/entrypoints/omni.py Added ZMQ context management, handshake server for stage coordination, and dynamic port allocation
vllm_omni/entrypoints/cli/serve.py Added headless mode and stage-id CLI arguments for independent stage deployment
vllm_omni/entrypoints/async_omni.py Updated cleanup handlers to support ZMQ queues
pyproject.toml Added pyzmq>=25.0.0 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wuhang2014 wuhang2014 force-pushed the stagecli branch 5 times, most recently from 4e7aff3 to 9e39c1f Compare February 5, 2026 10:25
@wuhang2014 wuhang2014 marked this pull request as ready for review February 5, 2026 10:27
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff2d5c10ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106 hsliuustc0106 requested a review from Copilot February 5, 2026 15:05
def attach_queues(self, in_q: mp.Queue, out_q: mp.Queue) -> None:
def attach_queues(
self,
in_q: mp.queues.Queue | ZmqQueue | str | None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still use mp.queues.Queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still require mp.queues.Queue, as verification with the Ray backend is not planned.

if not stage_configs:
if default_stage_cfg_factory is not None:
default_stage_cfg = default_stage_cfg_factory()
stage_configs = OmegaConf.create(default_stage_cfg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to use OmegaConf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OmegaConf can be removed from here only when we directly use dataclass, which is currently being worked on by @lishunyang12.

default_stage_cfg = default_stage_cfg_factory()
stage_configs = OmegaConf.create(default_stage_cfg)
else:
stage_configs = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function load_and_resolve_stage_configs is simply to reduce duplicated code, with no logical changes. This PR is also not related to any configuration refactor.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants