[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request by Isotr0py · Pull Request #36800 · vllm-project/vllm

Isotr0py · 2026-03-11T16:20:59Z

Purpose

Following PR for Support online use_audio_in_video #36319
Actually, Qwen2.5-Omni and Qwen3-Omni processor's apply_prompt_updates is incorrect for audio_in_video=True with cache. Then second requests in multi-turn requests will fail, because it will fall to use_audio_in_video=False code path:

(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 715, in process_for_engine
(APIServer pid=820051)     engine_prompt = self._process_singleton(prompt)
(APIServer pid=820051)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 691, in _process_singleton
(APIServer pid=820051)     return self._process_tokens(prompt)  # type: ignore[arg-type]
(APIServer pid=820051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 636, in _process_tokens
(APIServer pid=820051)     inputs = self._process_multimodal(
(APIServer pid=820051)              ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 622, in _process_multimodal
(APIServer pid=820051)     mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
(APIServer pid=820051)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing/processor.py", line 1663, in apply
(APIServer pid=820051)     prompt_ids, mm_placeholders = self._maybe_apply_prompt_updates(
(APIServer pid=820051)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 646, in _maybe_apply_prompt_updates
(APIServer pid=820051)     prompt_ids, mm_placeholders = self._apply_prompt_updates(
(APIServer pid=820051)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing/processor.py", line 1539, in _apply_prompt_updates
(APIServer pid=820051)     assert update_idx is not None, (
(APIServer pid=820051)            ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051) AssertionError: Failed to apply prompt replacement for mm_items['audio'][0]

In Support online use_audio_in_video #36319, we bypass mm processor cache for audio_in_video=True through override _cached_apply_hf_processor, because mm_cache will cause request failed when request hit.
This PR corrects use_audio_in_video detection when mm processor cache is hit.

Test Plan

pytest -s -v tests/entrypoints/openai/test_video.py -k test_online_audio_in_video

Test Result

Tests should pass.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request addresses a bug in the Qwen2.5-Omni and Qwen3-Omni models related to the mm_processor cache when audio_in_video=True. The fix involves correcting the use_audio_in_video detection when the mm processor cache is hit. The changes include modifications to qwen2_5_omni_thinker.py and qwen3_omni_moe_thinker.py to handle the cache logic correctly. I have identified a high severity issue in qwen2_5_omni_thinker.py related to the removal of unused imports, which could potentially break existing code if those imports were being used elsewhere.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/models/qwen2_5_omni_thinker.py (83-84)

high: Removing ProcessorInputs and TimingContext imports might break other parts of the code if they are used elsewhere. It's better to ensure that these imports are not used anywhere else before removing them.

from vllm.multimodal.processing import (
    BaseDummyInputsBuilder,
    ProcessorInputs,
    TimingContext,
)

DarkLight1337 · 2026-03-11T16:35:14Z

Can you add a regression test?

DarkLight1337 · 2026-03-11T16:35:31Z

vllm/model_executor/models/qwen2_5_omni_thinker.py

                        use_audio_in_video = bool(use_audio_in_video_tensor.item())
                        break
+            # for mutilmodality cache
+            if any([item is None for item in mm_kwargs["video"]]):


Suggested change

if any([item is None for item in mm_kwargs["video"]]):

if any(item is None for item in mm_kwargs["video"]):

Avoid repeated iteration (same below)

Isotr0py added 3 commits March 11, 2026 23:09

fix qwen omni cache

4dff72e

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix audio in video cache

65c6497

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

remove

74d0a66

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from sighingnow as a code owner March 11, 2026 16:21

mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 11, 2026

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

Isotr0py mentioned this pull request Mar 11, 2026

[Bugfix] Fix Qwen3-Omni/Qwen2.5-Omni use_audio_in_video deployment issues #36431

Open

DarkLight1337 reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800

[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800
Isotr0py wants to merge 3 commits intovllm-project:mainfrom
Isotr0py:fix-qwen-omni-cache

Isotr0py commented Mar 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 commented Mar 11, 2026

Uh oh!

DarkLight1337 Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if any([item is None for item in mm_kwargs["video"]]):
	if any(item is None for item in mm_kwargs["video"]):

Uh oh!

Conversation

Isotr0py commented Mar 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

vllm/model_executor/models/qwen2_5_omni_thinker.py (83-84)

Uh oh!

DarkLight1337 commented Mar 11, 2026

Uh oh!

DarkLight1337 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Isotr0py commented Mar 11, 2026 •

edited by github-actions bot

Loading