[Feature] Support Stage Based Deployment CLI#939
[Feature] Support Stage Based Deployment CLI#939wuhang2014 wants to merge 9 commits intovllm-project:mainfrom
Conversation
hsliuustc0106
left a comment
There was a problem hiding this comment.
-
Silent error handling - Multiple
except Exception: passblocks- Fix: Add logging:
except Exception as e: logger.debug(f"Error: {e}")
- Fix: Add logging:
-
Log spam -
logger.info()in hot paths (line 1466)- Fix: Change to
logger.debug()
- Fix: Change to
-
PR description incomplete - "Test Result" section is empty
- Fix: Add actual test output, performance metrics
95fffb4 to
0e1105f
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements stage-based deployment CLI support for vLLM-Omni, enabling independent deployment of pipeline stages across processes using ZMQ-based IPC. This is part of the larger effort described in issue #870 to support data parallelism for pipeline stages.
Changes:
- Added ZMQ-based queue utilities to replace multiprocessing queues for inter-stage communication
- Implemented headless mode for deploying individual stages independently
- Added dynamic port allocation and handshake protocol for stage coordination
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 47 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/entrypoints/zmq_utils.py | New file providing ZMQ queue wrapper and handshake utilities for stage communication |
| vllm_omni/entrypoints/omni_stage.py | Modified to support both ZMQ and multiprocessing queues, added cleanup handlers and queue spec support |
| vllm_omni/entrypoints/omni.py | Added ZMQ context management, handshake server for stage coordination, and dynamic port allocation |
| vllm_omni/entrypoints/cli/serve.py | Added headless mode and stage-id CLI arguments for independent stage deployment |
| vllm_omni/entrypoints/async_omni.py | Updated cleanup handlers to support ZMQ queues |
| pyproject.toml | Added pyzmq>=25.0.0 dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4e7aff3 to
9e39c1f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff2d5c10ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| def attach_queues(self, in_q: mp.Queue, out_q: mp.Queue) -> None: | ||
| def attach_queues( | ||
| self, | ||
| in_q: mp.queues.Queue | ZmqQueue | str | None, |
There was a problem hiding this comment.
do we still use mp.queues.Queue
There was a problem hiding this comment.
I still require mp.queues.Queue, as verification with the Ray backend is not planned.
| if not stage_configs: | ||
| if default_stage_cfg_factory is not None: | ||
| default_stage_cfg = default_stage_cfg_factory() | ||
| stage_configs = OmegaConf.create(default_stage_cfg) |
There was a problem hiding this comment.
do we still need to use OmegaConf?
There was a problem hiding this comment.
Yes, OmegaConf can be removed from here only when we directly use dataclass, which is currently being worked on by @lishunyang12.
| default_stage_cfg = default_stage_cfg_factory() | ||
| stage_configs = OmegaConf.create(default_stage_cfg) | ||
| else: | ||
| stage_configs = [] |
There was a problem hiding this comment.
what's the logic here?
There was a problem hiding this comment.
The function load_and_resolve_stage_configs is simply to reduce duplicated code, with no logical changes. This PR is also not related to any configuration refactor.
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Background is described in #870.
For now, only support single node, multiprocessing:
Test Plan
model: Qwen3-Omni
deployment CLI:
test script:
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)