Skip to content

[VLM] Support Qwen3-VL model#3253

Merged
Wovchena merged 61 commits intoopenvinotoolkit:masterfrom
yatarkan:yt/qwen3-vl
Mar 12, 2026
Merged

[VLM] Support Qwen3-VL model#3253
Wovchena merged 61 commits intoopenvinotoolkit:masterfrom
yatarkan:yt/qwen3-vl

Conversation

@yatarkan
Copy link
Contributor

@yatarkan yatarkan commented Jan 30, 2026

Description

This PR enables Qwen3-VL model in GenAI VLM pipeline.
Supports SDPA + PA backends in VLM pipeline and Continuous Batching pipeline (both generate() and add_request() APIs).

Depends on Optimum Intel PR latest Optimum Intel and transformers>=4.57.0 for model exporting.

CVS-175825

Resolves #2998

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings January 30, 2026 15:14
@github-actions github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 30, 2026
@yatarkan yatarkan changed the title [VLM] Add Qwen3-VL model support [VLM] Support Qwen3-VL model Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Qwen3-VL vision-language model to the GenAI VLM pipeline, enabling stateful inference for this model variant.

Changes:

  • Adds Qwen3-VL as a new model type with position embedding interpolation
  • Introduces additional language model inputs (deepstack_visual_embeds, visual_pos_masks) for Qwen3-VL
  • Extends JSON parameter reading to support nested keys with dot notation

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vlm_config.hpp Adds QWEN3_VL enum and configuration fields for position embeddings
vlm_config.cpp Registers qwen3_vl model type and reads new config parameters
vision_encoder.cpp Adds factory logic to instantiate Qwen3-VL vision encoder
qwen3_vl/classes.hpp Defines Qwen3-VL specific encoder and embedder classes with position interpolation
qwen3_vl/classes.cpp Implements position interpolation, spatial merging, and visual masking for Qwen3-VL
qwen2vl/classes.hpp Makes get_rotary_pos_emb virtual and adds merge_text_and_video_image_embeddings utility
qwen2vl/classes.cpp Adds const qualifier to get_rotary_pos_emb
processor_config.cpp Handles Qwen3-VL's alternative config format (shortest_edge/longest_edge)
pipeline.cpp Passes extra language model inputs during generation and includes Qwen3-VL in SDPA check
inputs_embedder.hpp Adds get_lm_extra_inputs interface method
inputs_embedder.cpp Implements factory logic for Qwen3-VL embedder
lm_encoding.hpp Adds lm_extra_inputs parameter to function signature
lm_encoding.cpp Sets extra inputs on language model inference requests
json_utils.hpp Adds support for nested JSON keys with dot notation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yatarkan
Copy link
Contributor Author

yatarkan commented Jan 30, 2026

WWB results

Image inputs (ATTENTION_BACKEND=SDPA)

GenAI vs HF
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.984632
GenAI vs Optimum
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.984632
GenAI vs Optimum
VISION_PREPROCESS=CPP
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.993601

Video inputs (ATTENTION_BACKEND=SDPA)

GenAI vs Optimum
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.99491
GenAI vs Optimum
VISION_PREPROCESS=CPP
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.999843

@popovaan
Copy link
Contributor

popovaan commented Feb 2, 2026

@Wovchena Wovchena linked an issue Feb 3, 2026 that may be closed by this pull request
@Wovchena Wovchena mentioned this pull request Feb 3, 2026
Copilot AI review requested due to automatic review settings February 19, 2026 18:55
@yatarkan yatarkan requested a review from Wovchena March 10, 2026 14:49
@Wovchena Wovchena requested a review from Copilot March 10, 2026 15:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 11, 2026 11:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.

@Wovchena Wovchena requested a review from Copilot March 12, 2026 13:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.


You can also share your feedback on Copilot code review. Take the survey.

@Wovchena Wovchena added this pull request to the merge queue Mar 12, 2026
Merged via the queue into openvinotoolkit:master with commit 995fa02 Mar 12, 2026
154 of 157 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: GHA CI based on Github actions category: llm_bench Label for tool/llm_bench folder category: LLM LLM pipeline (stateful, static) category: prompt lookup Prompt look-up decoding category: speculative decoding Speculative decoding category: visual language Visual language pipeline Code Freeze no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsupported qwen3_vl

5 participants