[Doc] add design docs for async chunk in qwen3-omni by R2-Y · Pull Request #962 · vllm-project/vllm-omni

R2-Y · 2026-01-26T09:11:09Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

add design docs for async chunk in qwen3-omni

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2026-01-31T07:33:46Z

docs/design/feature/async_chunk_design.md

@@ -0,0 +1,419 @@
+# Async Chunking


for the async-chunk-arch png, please use a larger font

hsliuustc0106 · 2026-01-31T07:37:12Z

docs/design/feature/async_chunk_design.md

+|--------|----------------------------|------------------------|-------------|-------------|-------------|------------------------|-------------|-------------|-------------|-------------|
+|single request | text | text + audio | True | 10 | 10 | 1 | 268.27 | 1268.83 | 20.28 | 1363.31 |
+|single request | text | text + audio | False | 10 | 10 | 1 | 56.73 | 1407.34 | 24.57 | 1408.03 |
+|single request | text | text + audio | True | 2500 | 900 | 1 | 380.03 | 1910.39 | 8.82 | 15650.26 |


the e2e time is even worse?

The total number of generated tokens is different. When async chunk is set to false, the number of generated tokens is 942, while when async chunk is set to true, the number of generated tokens is 1732. Perhaps we need to update the data to ensure the number of generated tokens is consistent.

or remove e2el

hsliuustc0106 · 2026-01-31T07:38:17Z

docs/design/feature/async_chunk_design.md

+- **Queue Coordination**: Temporary queues (waiting_for_chunk_waiting_requests, waiting_for_chunk_running_requests) keep requests out of base scheduler until chunk is ready, then restore
+
+## Performance
+1. **Reduced Latency**: Next stage can start processing immediately


you didn;t metion about througput ,memory, GPU utilization in the following table

hsliuustc0106

I think we need to put the data table at the beginning, drawing some histgram to compare is better

hsliuustc0106 · 2026-01-31T07:50:52Z

no need to test ci for docs

amy-why-3459 · 2026-01-31T09:09:19Z

I think we need to put the data table at the beginning, drawing some histgram to compare is better

fixed

hsliuustc0106

for the figure plot, we should add 3 seperate figures ttft ttfp&tpot using only two colors

hsliuustc0106 · 2026-01-31T10:53:09Z

docs/design/feature/async_chunk_design.md

+3. **IO-Compute Overlap**: Chunk retrieval happens asynchronously while other requests compute
+4. **Non-blocking Scheduler**: Requests waiting for chunks don't block the entire scheduler
+
+| Scenario | Input Modality | Output Modality | async_chunk | Input tokens num | Output tokens num | Request num |  TTFT(ms) |  TTFP(ms) |


tpot missed

Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

amy-why-3459 · 2026-02-04T03:50:41Z

docs/design/feature/async_chunk_design.md

+
+The `async_chunk` feature enables asynchronous, chunked processing of data across multiple stages in a multi-stage pipeline (e.g., Qwen3-Omni with Thinker → Talker → Code2Wav stages). Instead of waiting for a complete stage output before forwarding to the next stage, this feature allows stages to process and forward data in chunks as it becomes available, significantly reducing latency and improving throughput.
+
+**Chunk Size Definition**


The chunk size is defined as the num_scheduled_tokens of each step in each request. The num_scheduled_tokens of different steps in different requests may be different. For example, if the num_scheduled_tokens is 1 in the decoding phase, the chunk size is 1.

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

congw729 · 2026-02-06T03:36:27Z

no need to test ci for docs

It seems add [skip ci] to the commit message or PR title will skip BuildKite CI.

R2-Y changed the title ~~[WIP] add design docs for async chunk in qwen3-omni~~ [WIP] [Doc] add design docs for async chunk in qwen3-omni Jan 26, 2026

R2-Y force-pushed the async_chunk_doc branch from 8fa4d7f to 85a73ad Compare January 26, 2026 09:27

amy-why-3459 mentioned this pull request Jan 27, 2026

[RFC]: Support async computation and communication across stages by chunks JiusiServe/vllm-omni#1

Open

1 task

R2-Y force-pushed the async_chunk_doc branch 4 times, most recently from a25a6a9 to 493de9c Compare January 30, 2026 01:48

amy-why-3459 force-pushed the async_chunk_doc branch from 493de9c to 1ea7425 Compare January 31, 2026 07:26

hsliuustc0106 reviewed Jan 31, 2026

View reviewed changes

amy-why-3459 force-pushed the async_chunk_doc branch 3 times, most recently from 0f3f013 to 3ed3864 Compare January 31, 2026 09:08

amy-why-3459 force-pushed the async_chunk_doc branch 2 times, most recently from a2b35b5 to 99c9da1 Compare January 31, 2026 09:24

hsliuustc0106 reviewed Jan 31, 2026

View reviewed changes

R2-Y changed the title ~~[WIP] [Doc] add design docs for async chunk in qwen3-omni~~ [Doc] add design docs for async chunk in qwen3-omni Feb 2, 2026

R2-Y force-pushed the async_chunk_doc branch from 017005d to 57df3b2 Compare February 2, 2026 03:35

add design docs for async chunk in qwen3-omni

50b8f35

Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

R2-Y force-pushed the async_chunk_doc branch 2 times, most recently from b82b385 to 3185731 Compare February 2, 2026 12:30

hsliuustc0106 mentioned this pull request Feb 3, 2026

[WIP] [Model] Step-Audio2 #464

Open

5 tasks

amy-why-3459 reviewed Feb 4, 2026

View reviewed changes

R2-Y force-pushed the async_chunk_doc branch from 3185731 to 419962c Compare February 4, 2026 07:38

remove duplicate content & add performance diagram

075fde5

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

R2-Y force-pushed the async_chunk_doc branch from 419962c to 075fde5 Compare February 4, 2026 07:52

Merge branch 'main' into async_chunk_doc

b0032fc

linyueqian mentioned this pull request Feb 5, 2026

[RFC]: Qwen3-TTS Production Ready - February Milestone #938

Open

Merge branch 'main' into async_chunk_doc

640cafa


		The `async_chunk` feature enables asynchronous, chunked processing of data across multiple stages in a multi-stage pipeline (e.g., Qwen3-Omni with Thinker → Talker → Code2Wav stages). Instead of waiting for a complete stage output before forwarding to the next stage, this feature allows stages to process and forward data in chunks as it becomes available, significantly reducing latency and improving throughput.

		Chunk Size Definition

Conversation

R2-Y commented Jan 26, 2026

Purpose

Test Plan

Test Result

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 31, 2026

Uh oh!

amy-why-3459 commented Jan 31, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

congw729 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

congw729 commented Feb 6, 2026 •

edited

Loading