Skip to content

[Refactor] Reuse upstream Qwen3MoeSparseMoeBlock#1202

Merged
Isotr0py merged 2 commits intovllm-project:mainfrom
gcanlin:qwen3-moe
Feb 6, 2026
Merged

[Refactor] Reuse upstream Qwen3MoeSparseMoeBlock#1202
Isotr0py merged 2 commits intovllm-project:mainfrom
gcanlin:qwen3-moe

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Feb 4, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

After vLLM-Omni upgrade vLLM to v0.15.0, we can reuse vllm-project/vllm#32082 and remove the redundant talker MoE safely.

Test Plan

bash run_single_prompt.sh

Test Result

O 02-04 15:27:33 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 02-04 15:27:33 [log_utils.py:550]  'request_id': '0_93100e0c-8a35-4d81-8927-feb32c55b762',
INFO 02-04 15:27:33 [log_utils.py:550]  'e2e_time_ms': 32853.352546691895,
INFO 02-04 15:27:33 [log_utils.py:550]  'e2e_tpt': 98.65871635643211,
INFO 02-04 15:27:33 [log_utils.py:550]  'e2e_total_tokens': 333,
INFO 02-04 15:27:33 [log_utils.py:550]  'transfers_total_time_ms': 28.17392349243164,
INFO 02-04 15:27:33 [log_utils.py:550]  'transfers_total_bytes': 6988121,
INFO 02-04 15:27:33 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 7010.502099990845,
INFO 02-04 15:27:33 [log_utils.py:550]                 'num_tokens_out': 15,
INFO 02-04 15:27:33 [log_utils.py:550]                 'num_tokens_in': 265},
INFO 02-04 15:27:33 [log_utils.py:550]             1: {'stage_gen_time_ms': 5290.610313415527, 'num_tokens_out': 53},
INFO 02-04 15:27:33 [log_utils.py:550]             2: {'stage_gen_time_ms': 93.02067756652832, 'num_tokens_out': 0}}}
Processed prompts: 100%|██████████████████████████████████| 1/1 [00:32<00:00, 32.85s/req, est. speed stage-2 tok/s: 10.14, avg e2e_lat: 0.0ms]
INFO 02-04 15:27:33 [omni.py:860] [Summary] {'e2e_requests': 1, [00:32<00:00, 32.85s/req, est. speed stage-2 tok/s: 10.14, avg e2e_lat: 0.0ms]
INFO 02-04 15:27:33 [omni.py:860]  'e2e_total_time_ms': 32854.35175895691,
INFO 02-04 15:27:33 [omni.py:860]  'e2e_sum_time_ms': 32853.352546691895,
INFO 02-04 15:27:33 [omni.py:860]  'e2e_total_tokens': 333,
INFO 02-04 15:27:33 [omni.py:860]  'e2e_avg_time_per_request_ms': 32853.352546691895,
INFO 02-04 15:27:33 [omni.py:860]  'e2e_avg_tokens_per_s': 10.135951864478159,
INFO 02-04 15:27:33 [omni.py:860]  'wall_time_ms': 32854.35175895691,
INFO 02-04 15:27:33 [omni.py:860]  'final_stage_id': {'0_93100e0c-8a35-4d81-8927-feb32c55b762': 2},
INFO 02-04 15:27:33 [omni.py:860]  'stages': [{'stage_id': 0,
INFO 02-04 15:27:33 [omni.py:860]              'requests': 1,
INFO 02-04 15:27:33 [omni.py:860]              'tokens': 280,
INFO 02-04 15:27:33 [omni.py:860]              'total_time_ms': 17102.519512176514,
INFO 02-04 15:27:33 [omni.py:860]              'avg_time_per_request_ms': 17102.519512176514,
INFO 02-04 15:27:33 [omni.py:860]              'avg_tokens_per_s': 16.37185677821609},
INFO 02-04 15:27:33 [omni.py:860]             {'stage_id': 1,
INFO 02-04 15:27:33 [omni.py:860]              'requests': 1,
INFO 02-04 15:27:33 [omni.py:860]              'tokens': 53,
INFO 02-04 15:27:33 [omni.py:860]              'total_time_ms': 15376.47795677185,
INFO 02-04 15:27:33 [omni.py:860]              'avg_time_per_request_ms': 15376.47795677185,
INFO 02-04 15:27:33 [omni.py:860]              'avg_tokens_per_s': 3.4468231378472876},
INFO 02-04 15:27:33 [omni.py:860]             {'stage_id': 2,
INFO 02-04 15:27:33 [omni.py:860]              'requests': 1,
INFO 02-04 15:27:33 [omni.py:860]              'tokens': 0,
INFO 02-04 15:27:33 [omni.py:860]              'total_time_ms': 102.47135162353516,
INFO 02-04 15:27:33 [omni.py:860]              'avg_time_per_request_ms': 102.47135162353516,
INFO 02-04 15:27:33 [omni.py:860]              'avg_tokens_per_s': 0.0}],
INFO 02-04 15:27:33 [omni.py:860]  'transfers': [{'from_stage': 0,
INFO 02-04 15:27:33 [omni.py:860]                 'to_stage': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'total_bytes': 5964365,
INFO 02-04 15:27:33 [omni.py:860]                 'total_time_ms': 12.66622543334961,
INFO 02-04 15:27:33 [omni.py:860]                 'tx_mbps': 3767.0985923216504,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_total_bytes': 5964365,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_total_time_ms': 10.818004608154297,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_mbps': 4410.695107675761,
INFO 02-04 15:27:33 [omni.py:860]                 'total_samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'total_transfer_time_ms': 24.20973777770996,
INFO 02-04 15:27:33 [omni.py:860]                 'total_mbps': 1970.8978444174388},
INFO 02-04 15:27:33 [omni.py:860]                {'from_stage': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'to_stage': 2,
INFO 02-04 15:27:33 [omni.py:860]                 'samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'total_bytes': 1023756,
INFO 02-04 15:27:33 [omni.py:860]                 'total_time_ms': 1.355886459350586,
INFO 02-04 15:27:33 [omni.py:860]                 'tx_mbps': 6040.364179108845,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_total_bytes': 1023756,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_total_time_ms': 1.9745826721191406,
INFO 02-04 15:27:33 [omni.py:860]                 'rx_mbps': 4147.736185292441,
INFO 02-04 15:27:33 [omni.py:860]                 'total_samples': 1,
INFO 02-04 15:27:33 [omni.py:860]                 'total_transfer_time_ms': 3.9641857147216797,
INFO 02-04 15:27:33 [omni.py:860]                 'total_mbps': 2066.010169398689}]}
Adding requests:   0%|                                                                                                  | 0/1 [00:32<?, ?it/s]
query type: use_audio
Request ID: 0_93100e0c-8a35-4d81-8927-feb32c55b762, Text saved to output_audio/0_93100e0c-8a35-4d81-8927-feb32c55b762.txt
Request ID: 0_93100e0c-8a35-4d81-8927-feb32c55b762, Saved audio to output_audio/output_0_93100e0c-8a35-4d81-8927-feb32c55b762.wav
output-2.mp4

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin
Copy link
Contributor Author

gcanlin commented Feb 4, 2026

cc @Isotr0py

@hsliuustc0106
Copy link
Collaborator

hiw about he acc?

@gcanlin
Copy link
Contributor Author

gcanlin commented Feb 5, 2026

hiw about he acc?

I uploaded the audio in the description. The accuracy is correct. Before this PR, we used monkey patch way to add the SharedFusedMoE. Removing it would make code clean and healthy.

@Isotr0py Isotr0py enabled auto-merge (squash) February 5, 2026 03:00
@Isotr0py Isotr0py added the ready label to trigger buildkite CI label Feb 5, 2026
@Isotr0py Isotr0py merged commit 285d71b into vllm-project:main Feb 6, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants