[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request#36800Isotr0py wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the Qwen2.5-Omni and Qwen3-Omni models related to the mm_processor cache when audio_in_video=True. The fix involves correcting the use_audio_in_video detection when the mm processor cache is hit. The changes include modifications to qwen2_5_omni_thinker.py and qwen3_omni_moe_thinker.py to handle the cache logic correctly. I have identified a high severity issue in qwen2_5_omni_thinker.py related to the removal of unused imports, which could potentially break existing code if those imports were being used elsewhere.
I am having trouble creating individual review comments. Click here to see my feedback.
vllm/model_executor/models/qwen2_5_omni_thinker.py (83-84)
high: Removing ProcessorInputs and TimingContext imports might break other parts of the code if they are used elsewhere. It's better to ensure that these imports are not used anywhere else before removing them.
from vllm.multimodal.processing import (
BaseDummyInputsBuilder,
ProcessorInputs,
TimingContext,
)|
Can you add a regression test? |
| use_audio_in_video = bool(use_audio_in_video_tensor.item()) | ||
| break | ||
| # for mutilmodality cache | ||
| if any([item is None for item in mm_kwargs["video"]]): |
There was a problem hiding this comment.
| if any([item is None for item in mm_kwargs["video"]]): | |
| if any(item is None for item in mm_kwargs["video"]): |
Avoid repeated iteration (same below)
Purpose
apply_prompt_updatesis incorrect foraudio_in_video=Truewith cache. Then second requests in multi-turn requests will fail, because it will fall touse_audio_in_video=Falsecode path:audio_in_video=Truethrough override_cached_apply_hf_processor, because mm_cache will cause request failed when request hit.Test Plan
Test Result
Tests should pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.