[Feature] Support Stage Based Deployment CLI by wuhang2014 · Pull Request #939 · vllm-project/vllm-omni

wuhang2014 · 2026-01-25T05:00:00Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Background is described in #870.

For now, only support single node, multiprocessing:

Multiple node is not supported;
Ray backend is not supported;
DP for diffusion model is not supported;

Test Plan

model: Qwen3-Omni

deployment CLI:

stage-0

CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 0 --data-parallel-size 2

stage-1

CUDA_VISIBLE_DEVICES=2 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 1 --headless

stage-2

CUDA_VISIBLE_DEVICES=3 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 2 --headless

test script:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
          {
            "role": "user",
            "content": [
              { "type": "text", "text": "What’s in this image?" },
              {
                "type": "image_url",
                "image_url": {
                  "url": "file:///data/wuhang/dog-4988985_960_720.jpg"
                }
              }
            ]
          }
    ],
    "audio": { "voice": "alloy", "format": "wav" }
  }'

Test Result

(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --model /data/models/Qwen3-Omni-30B-A3B-Instruct/ --image-path /data/wuhang/dog-4988985_960_720.jpg 
Chat completion output from text: Based on the image provided, here is a detailed description of its content:

This is a professionally taken, close-up photograph of a happy dog lying in a field of green grass.

*   **Main Subject:** The central focus is a Pembroke Welsh Corgi. It has a classic tan and white coat, with tan fur covering its head, ears, and back, and white fur on its chest, neck, and muzzle.
*   **Expression and Pose:** The corgi is lying down but looking directly at the camera with an alert and joyful expression. Its mouth is open in what appears to be a smile, with its pink tongue slightly visible. Its large, erect ears are pointed forward, indicating it is attentive.
*   **Setting and Lighting:** The dog is in a lush, sunlit grassy area. The lighting suggests it's either early morning or late afternoon (golden hour), casting a warm, soft glow over the scene. The background is softly blurred (a shallow depth of field), showing out-of-focus trees and foliage, which helps to emphasize the dog as the main subject.
*   **Details:** The corgi is wearing a dark green collar around its neck.
Audio saved to audio_0.wav
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# ls -l
total 2920
-rw-r--r-- 1 root root 2918954 Jan 26 08:57 audio_0.wav
-rw-r--r-- 1 root root   19876 Jan 22 12:00 gradio_demo.py
-rw-r--r-- 1 root root   16995 Jan 25 11:14 openai_chat_completion_client_for_multimodal_generation.py
-rw-r--r-- 1 root root    1177 Jan 22 12:00 qwen3_omni_moe_thinking.yaml
-rw-r--r-- 1 root root    7166 Jan 22 12:00 README.md
-rw-r--r-- 1 root root    4359 Jan 22 12:00 run_curl_multimodal_generation.sh
-rwxr-xr-x 1 root root    6123 Jan 22 12:00 run_gradio_demo.sh
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni#

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106

Silent error handling - Multiple except Exception: pass blocks
- Fix: Add logging: except Exception as e: logger.debug(f"Error: {e}")
Log spam - logger.info() in hot paths (line 1466)
- Fix: Change to logger.debug()
PR description incomplete - "Test Result" section is empty
- Fix: Add actual test output, performance metrics

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/omni_stage.py

Copilot

Pull request overview

This PR implements stage-based deployment CLI support for vLLM-Omni, enabling independent deployment of pipeline stages across processes using ZMQ-based IPC. This is part of the larger effort described in issue #870 to support data parallelism for pipeline stages.

Changes:

Added ZMQ-based queue utilities to replace multiprocessing queues for inter-stage communication
Implemented headless mode for deploying individual stages independently
Added dynamic port allocation and handshake protocol for stage coordination

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 47 comments.

Show a summary per file

File	Description
vllm_omni/entrypoints/zmq_utils.py	New file providing ZMQ queue wrapper and handshake utilities for stage communication
vllm_omni/entrypoints/omni_stage.py	Modified to support both ZMQ and multiprocessing queues, added cleanup handlers and queue spec support
vllm_omni/entrypoints/omni.py	Added ZMQ context management, handshake server for stage coordination, and dynamic port allocation
vllm_omni/entrypoints/cli/serve.py	Added headless mode and stage-id CLI arguments for independent stage deployment
vllm_omni/entrypoints/async_omni.py	Updated cleanup handlers to support ZMQ queues
pyproject.toml	Added pyzmq>=25.0.0 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/cli/serve.py

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/omni_stage.py

vllm_omni/entrypoints/cli/serve.py

vllm_omni/entrypoints/zmq_utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff2d5c10ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/omni.py

hsliuustc0106 · 2026-02-05T15:08:11Z

vllm_omni/entrypoints/omni_stage.py

-    def attach_queues(self, in_q: mp.Queue, out_q: mp.Queue) -> None:
+    def attach_queues(
+        self,
+        in_q: mp.queues.Queue | ZmqQueue | str | None,


do we still use mp.queues.Queue

I still require mp.queues.Queue, as verification with the Ray backend is not planned.

hsliuustc0106 · 2026-02-05T15:09:34Z

vllm_omni/entrypoints/utils.py

+        if not stage_configs:
+            if default_stage_cfg_factory is not None:
+                default_stage_cfg = default_stage_cfg_factory()
+                stage_configs = OmegaConf.create(default_stage_cfg)


do we still need to use OmegaConf?

Yes, OmegaConf can be removed from here only when we directly use dataclass, which is currently being worked on by @lishunyang12.

hsliuustc0106 · 2026-02-05T15:09:59Z

vllm_omni/entrypoints/utils.py

+                default_stage_cfg = default_stage_cfg_factory()
+                stage_configs = OmegaConf.create(default_stage_cfg)
+            else:
+                stage_configs = []


what's the logic here?

The function load_and_resolve_stage_configs is simply to reduce duplicated code, with no logical changes. This PR is also not related to any configuration refactor.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: wuhang <wuhang6@huawei.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/cli/serve.py

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/cli/serve.py

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/cli/serve.py

vllm_omni/entrypoints/zmq_utils.py

hsliuustc0106 reviewed Jan 25, 2026

View reviewed changes

vllm_omni/entrypoints/omni.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/omni_stage.py Outdated Show resolved Hide resolved

david6666666 modified the milestone: v0.14.0 Jan 26, 2026

wuhang2014 force-pushed the stagecli branch 13 times, most recently from 95fffb4 to 0e1105f Compare February 3, 2026 10:47

wuhang2014 mentioned this pull request Feb 3, 2026

[RFC]: Support DP of pipeline stage #870

Open

15 tasks

hsliuustc0106 requested a review from Copilot February 3, 2026 14:26

Copilot started reviewing on behalf of hsliuustc0106 February 3, 2026 14:34 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

wuhang2014 mentioned this pull request Feb 4, 2026

[RFC]: Support Stage Based Deployment CLI JiusiServe/vllm-omni#102

Open

1 task

wuhang2014 force-pushed the stagecli branch 5 times, most recently from 4e7aff3 to 9e39c1f Compare February 5, 2026 10:25

wuhang2014 marked this pull request as ready for review February 5, 2026 10:27

wuhang2014 force-pushed the stagecli branch from 9e39c1f to ff2d5c1 Compare February 5, 2026 10:27

chatgpt-codex-connector bot reviewed Feb 5, 2026

View reviewed changes

vllm_omni/entrypoints/omni.py Outdated Show resolved Hide resolved

hsliuustc0106 requested a review from Copilot February 5, 2026 15:05

Copilot started reviewing on behalf of hsliuustc0106 February 5, 2026 15:11 View session

hsliuustc0106 reviewed Feb 5, 2026

View reviewed changes

Copilot AI reviewed Feb 5, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request Feb 6, 2026

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Open

41 tasks

wuhang2014 added 8 commits February 6, 2026 15:43

stage cli

761b03d

Signed-off-by: wuhang <wuhang6@huawei.com>

ignore runtime cfg if stage cli

783f16a

Signed-off-by: wuhang <wuhang6@huawei.com>

use open port at runtime

af73dc1

Signed-off-by: wuhang <wuhang6@huawei.com>

optimize code

5e29c72

Signed-off-by: wuhang <wuhang6@huawei.com>

--api-server-count default None

aceb1f6

Signed-off-by: wuhang <wuhang6@huawei.com>

bugfix

f3e85a0

Signed-off-by: wuhang <wuhang6@huawei.com>

bugfix

f2b8f53

Signed-off-by: wuhang <wuhang6@huawei.com>

bugfix

bcf88aa

Signed-off-by: wuhang <wuhang6@huawei.com>

wuhang2014 force-pushed the stagecli branch from 7e3a648 to 903c857 Compare February 6, 2026 07:44

add pyzmq dependency

903c857

Signed-off-by: wuhang <wuhang6@huawei.com>

wuhang2014 requested a review from Copilot February 6, 2026 07:55

Copilot started reviewing on behalf of wuhang2014 February 6, 2026 07:55 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 6, 2026

Add unit tests for zmq_utils.py (PR #939) wuhang2014/vllm-omni#2

Draft

Conversation

wuhang2014 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wuhang2014 commented Jan 25, 2026 •

edited

Loading