[VLM] Support Qwen3-VL model by yatarkan · Pull Request #3253 · openvinotoolkit/openvino.genai

yatarkan · 2026-01-30T15:14:15Z

Description

This PR enables Qwen3-VL model in GenAI VLM pipeline.
Supports SDPA + PA backends in VLM pipeline and Continuous Batching pipeline (both generate() and add_request() APIs).

Depends on ~~Optimum Intel PR~~ latest Optimum Intel and transformers>=4.57.0 for model exporting.

CVS-175825

Resolves #2998

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR adds support for the Qwen3-VL vision-language model to the GenAI VLM pipeline, enabling stateful inference for this model variant.

Changes:

Adds Qwen3-VL as a new model type with position embedding interpolation
Introduces additional language model inputs (deepstack_visual_embeds, visual_pos_masks) for Qwen3-VL
Extends JSON parameter reading to support nested keys with dot notation

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
vlm_config.hpp	Adds QWEN3_VL enum and configuration fields for position embeddings
vlm_config.cpp	Registers qwen3_vl model type and reads new config parameters
vision_encoder.cpp	Adds factory logic to instantiate Qwen3-VL vision encoder
qwen3_vl/classes.hpp	Defines Qwen3-VL specific encoder and embedder classes with position interpolation
qwen3_vl/classes.cpp	Implements position interpolation, spatial merging, and visual masking for Qwen3-VL
qwen2vl/classes.hpp	Makes get_rotary_pos_emb virtual and adds merge_text_and_video_image_embeddings utility
qwen2vl/classes.cpp	Adds const qualifier to get_rotary_pos_emb
processor_config.cpp	Handles Qwen3-VL's alternative config format (shortest_edge/longest_edge)
pipeline.cpp	Passes extra language model inputs during generation and includes Qwen3-VL in SDPA check
inputs_embedder.hpp	Adds get_lm_extra_inputs interface method
inputs_embedder.cpp	Implements factory logic for Qwen3-VL embedder
lm_encoding.hpp	Adds lm_extra_inputs parameter to function signature
lm_encoding.cpp	Sets extra inputs on language model inference requests
json_utils.hpp	Adds support for nested JSON keys with dot notation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/visual_language/qwen3_vl/classes.cpp

yatarkan · 2026-01-30T15:21:08Z

WWB results

Image inputs (`ATTENTION_BACKEND=SDPA`)

GenAI vs HF
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.984632

GenAI vs Optimum
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.984632

GenAI vs Optimum
VISION_PREPROCESS=CPP
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.993601

Video inputs (`ATTENTION_BACKEND=SDPA`)

GenAI vs Optimum
VISION_PREPROCESS=OV (default)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.99491

GenAI vs Optimum
VISION_PREPROCESS=CPP
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_vl_2b_instruct_fp16
INFO:whowhatbench.wwb:   similarity
0    0.999843

popovaan · 2026-02-02T15:14:08Z

@yatarkan Could you please also run WWB with video inputs? Here is the instruction: https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark#compare-visual-language-models-with-video-inputs-vlms

…thods

…oat, fix review comments

src/cpp/src/visual_language/video_processor_config.hpp

src/cpp/src/visual_language/qwen3_vl/classes.cpp

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 1 comment.

You can also share your feedback on Copilot code review. Take the survey.

src/cpp/src/visual_language/qwen3_vl/classes.cpp

src/cpp/src/visual_language/vision_encoder.hpp

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.

src/cpp/src/visual_language/qwen2vl/classes.cpp

tests/python_tests/test_vlm_pipeline.py

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

You can also share your feedback on Copilot code review. Take the survey.

src/cpp/src/continuous_batching/pipeline_base.cpp

src/cpp/src/visual_language/qwen2vl/classes.cpp

src/cpp/src/visual_language/qwen3_vl/classes.cpp

yatarkan added 9 commits January 27, 2026 11:44

Add qwen3_vl to vlm model type enum

07f5441

Expose merge_text_and_video_image_embeddings qwen2vl utils function

2732e8d

Enable nested json param reading for vector type

a5b870b

Add qwen3_vl specific vlm config

7e654b3

Add qwen3_vl params to processor config

75be9d2

Add classes for qwen3_vl

7a4f8b9

Add qwen3_vl classes to inputs embedder and vision encoder

f44f6c2

Enable extra inputs in lm_encoding

49e31ac

Propagate lm extra inputs in stateful vlm pipeline

f023416

Copilot AI review requested due to automatic review settings January 30, 2026 15:14

github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 30, 2026

yatarkan changed the title ~~[VLM] Add Qwen3-VL model support~~ [VLM] Support Qwen3-VL model Jan 30, 2026

Copilot AI reviewed Jan 30, 2026

View reviewed changes

src/cpp/src/visual_language/qwen3_vl/classes.cpp Outdated Show resolved Hide resolved

Wovchena linked an issue Feb 3, 2026 that may be closed by this pull request

Unsupported qwen3_vl #2998

Closed

Wovchena mentioned this pull request Feb 3, 2026

Unsupported qwen3_vl #2998

Closed

yatarkan added 2 commits February 9, 2026 18:44

Merge branch 'master' into yt/qwen3-vl

b94d9fc

Fix signature after merge

2dc6698

rkazants mentioned this pull request Feb 10, 2026

Qwen3-VL support huggingface/optimum-intel#1478

Closed

This was referenced Feb 17, 2026

Qwen3-VL support openvinotoolkit/model_server#3988

Merged

OVMS 2026.0.0.9944c3235 - Unsupported 'qwen3_vl' VLM model type openvinotoolkit/model_server#3917

Open

yatarkan added 5 commits February 19, 2026 22:38

Add qwen3vl extra inputs to sequence group

a7b8818

Use qwen3vl extra inputs in CB model runner

e974a35

Propogate lm extra inputs to CB pipelines generate and add_request me…

d116d1d

…thods

Update generate and add_request signatures of CB inherited classes

7bbe27c

Add tiny-random-qwen3-vl model to python tests

8a73df1

Copilot AI review requested due to automatic review settings February 19, 2026 18:55

yatarkan added 3 commits March 10, 2026 18:45

Fix docstring

acb70db

Fix review comment

831244c

Move qwen3vl utils to .cpp anonymous namespace, change fps type to fl…

fe6ade2

…oat, fix review comments

yatarkan requested a review from Wovchena March 10, 2026 14:49

Wovchena requested a review from Copilot March 10, 2026 15:35

Wovchena reviewed Mar 10, 2026

View reviewed changes

src/cpp/src/visual_language/video_processor_config.hpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/qwen3_vl/classes.cpp Show resolved Hide resolved

Copilot started reviewing on behalf of Wovchena March 10, 2026 15:36 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

src/cpp/src/visual_language/qwen3_vl/classes.cpp Show resolved Hide resolved

Fix default value

8a9d7b6

Wovchena reviewed Mar 11, 2026

View reviewed changes

src/cpp/src/visual_language/vision_encoder.hpp Outdated Show resolved Hide resolved

yatarkan added 5 commits March 11, 2026 15:15

Add comment for video processor config fps

0bfacd4

Add qwen3-vl tests skip for resolutions corner cases

3152801

Merge branch 'master' into yt/qwen3-vl

05933bf

Remove unused original_frames_num from video metadata

0caa12b

Merge branch 'master' into yt/qwen3-vl

fc7bc6e

Copilot AI review requested due to automatic review settings March 11, 2026 11:53

Copilot started reviewing on behalf of yatarkan March 11, 2026 11:54 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

src/cpp/src/visual_language/qwen2vl/classes.cpp Show resolved Hide resolved

src/cpp/src/visual_language/qwen2vl/classes.cpp Show resolved Hide resolved

tests/python_tests/test_vlm_pipeline.py Outdated Show resolved Hide resolved

tests/python_tests/test_vlm_pipeline.py Show resolved Hide resolved

yatarkan added 2 commits March 12, 2026 15:13

Merge branch 'master' into yt/qwen3-vl

388e4f9

Unskip qwen3-vl PA test cases

2f1a38b

Wovchena requested a review from Copilot March 12, 2026 13:07

Wovchena approved these changes Mar 12, 2026

View reviewed changes

Wovchena enabled auto-merge March 12, 2026 13:07

Copilot started reviewing on behalf of Wovchena March 12, 2026 13:07 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

src/cpp/src/continuous_batching/pipeline_base.cpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/qwen2vl/classes.cpp Show resolved Hide resolved

src/cpp/src/visual_language/qwen3_vl/classes.cpp Show resolved Hide resolved

Align lm extra inputs map copy for add_request API

daaa67a

Wovchena added this pull request to the merge queue Mar 12, 2026

Merged via the queue into openvinotoolkit:master with commit 995fa02 Mar 12, 2026
154 of 157 checks passed

Wovchena mentioned this pull request Mar 13, 2026

Enable support for VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B #3414

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Support Qwen3-VL model#3253

[VLM] Support Qwen3-VL model#3253
Wovchena merged 61 commits intoopenvinotoolkit:masterfrom
yatarkan:yt/qwen3-vl

yatarkan commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

yatarkan commented Jan 30, 2026 •

edited

Loading

Uh oh!

popovaan commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yatarkan commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

yatarkan commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WWB results

Image inputs (ATTENTION_BACKEND=SDPA)

Video inputs (ATTENTION_BACKEND=SDPA)

Uh oh!

popovaan commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yatarkan commented Jan 30, 2026 •

edited

Loading

yatarkan commented Jan 30, 2026 •

edited

Loading

Image inputs (`ATTENTION_BACKEND=SDPA`)

Video inputs (`ATTENTION_BACKEND=SDPA`)

popovaan commented Feb 2, 2026 •

edited

Loading