Skip to content

[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800

Open
Isotr0py wants to merge 3 commits intovllm-project:mainfrom
Isotr0py:fix-qwen-omni-cache
Open

[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800
Isotr0py wants to merge 3 commits intovllm-project:mainfrom
Isotr0py:fix-qwen-omni-cache

Conversation

@Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Mar 11, 2026

Purpose

  • Following PR for Support online use_audio_in_video #36319
  • Actually, Qwen2.5-Omni and Qwen3-Omni processor's apply_prompt_updates is incorrect for audio_in_video=True with cache. Then second requests in multi-turn requests will fail, because it will fall to use_audio_in_video=False code path:
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 715, in process_for_engine
(APIServer pid=820051)     engine_prompt = self._process_singleton(prompt)
(APIServer pid=820051)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 691, in _process_singleton
(APIServer pid=820051)     return self._process_tokens(prompt)  # type: ignore[arg-type]
(APIServer pid=820051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 636, in _process_tokens
(APIServer pid=820051)     inputs = self._process_multimodal(
(APIServer pid=820051)              ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/renderers/base.py", line 622, in _process_multimodal
(APIServer pid=820051)     mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
(APIServer pid=820051)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing/processor.py", line 1663, in apply
(APIServer pid=820051)     prompt_ids, mm_placeholders = self._maybe_apply_prompt_updates(
(APIServer pid=820051)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 646, in _maybe_apply_prompt_updates
(APIServer pid=820051)     prompt_ids, mm_placeholders = self._apply_prompt_updates(
(APIServer pid=820051)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051)   File "/home/mozf/develop-projects/vllm/vllm/multimodal/processing/processor.py", line 1539, in _apply_prompt_updates
(APIServer pid=820051)     assert update_idx is not None, (
(APIServer pid=820051)            ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=820051) AssertionError: Failed to apply prompt replacement for mm_items['audio'][0]
  • In Support online use_audio_in_video #36319, we bypass mm processor cache for audio_in_video=True through override _cached_apply_hf_processor, because mm_cache will cause request failed when request hit.
  • This PR corrects use_audio_in_video detection when mm processor cache is hit.

Test Plan

pytest -s -v tests/entrypoints/openai/test_video.py -k test_online_audio_in_video

Test Result

Tests should pass.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@Isotr0py Isotr0py requested a review from sighingnow as a code owner March 11, 2026 16:21
@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Mar 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the Qwen2.5-Omni and Qwen3-Omni models related to the mm_processor cache when audio_in_video=True. The fix involves correcting the use_audio_in_video detection when the mm processor cache is hit. The changes include modifications to qwen2_5_omni_thinker.py and qwen3_omni_moe_thinker.py to handle the cache logic correctly. I have identified a high severity issue in qwen2_5_omni_thinker.py related to the removal of unused imports, which could potentially break existing code if those imports were being used elsewhere.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/models/qwen2_5_omni_thinker.py (83-84)

high

high: Removing ProcessorInputs and TimingContext imports might break other parts of the code if they are used elsewhere. It's better to ensure that these imports are not used anywhere else before removing them.

from vllm.multimodal.processing import (
    BaseDummyInputsBuilder,
    ProcessorInputs,
    TimingContext,
)

@DarkLight1337
Copy link
Member

Can you add a regression test?

use_audio_in_video = bool(use_audio_in_video_tensor.item())
break
# for mutilmodality cache
if any([item is None for item in mm_kwargs["video"]]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if any([item is None for item in mm_kwargs["video"]]):
if any(item is None for item in mm_kwargs["video"]):

Avoid repeated iteration (same below)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants