Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
07f5441
Add qwen3_vl to vlm model type enum
yatarkan Jan 27, 2026
2732e8d
Expose merge_text_and_video_image_embeddings qwen2vl utils function
yatarkan Jan 30, 2026
a5b870b
Enable nested json param reading for vector type
yatarkan Jan 30, 2026
7e654b3
Add qwen3_vl specific vlm config
yatarkan Jan 30, 2026
75be9d2
Add qwen3_vl params to processor config
yatarkan Jan 30, 2026
7a4f8b9
Add classes for qwen3_vl
yatarkan Jan 30, 2026
f44f6c2
Add qwen3_vl classes to inputs embedder and vision encoder
yatarkan Jan 30, 2026
49e31ac
Enable extra inputs in lm_encoding
yatarkan Jan 30, 2026
f023416
Propagate lm extra inputs in stateful vlm pipeline
yatarkan Jan 30, 2026
b94d9fc
Merge branch 'master' into yt/qwen3-vl
yatarkan Feb 9, 2026
2dc6698
Fix signature after merge
yatarkan Feb 9, 2026
a7b8818
Add qwen3vl extra inputs to sequence group
yatarkan Feb 19, 2026
e974a35
Use qwen3vl extra inputs in CB model runner
yatarkan Feb 19, 2026
d116d1d
Propogate lm extra inputs to CB pipelines generate and add_request me…
yatarkan Feb 19, 2026
7bbe27c
Update generate and add_request signatures of CB inherited classes
yatarkan Feb 19, 2026
8a73df1
Add tiny-random-qwen3-vl model to python tests
yatarkan Feb 19, 2026
52bee9e
Add min tf version check for vlm pipeline tests
yatarkan Feb 20, 2026
eef0d57
Add steps for running vlm tests with qwen3-vl in CI
yatarkan Feb 20, 2026
01bb64b
Check sequence group type in forward for deepstack inputs aggregated …
yatarkan Feb 20, 2026
528d4ef
Merge branch 'master' into yt/qwen3-vl
yatarkan Feb 20, 2026
03dd46f
Add optimum intel installation from master
yatarkan Feb 20, 2026
6bba7f0
Update qwen3_vl default resolution in python tests, add skip for fail…
yatarkan Feb 25, 2026
196a135
Disable cleanup tokenization spaces for genai vs optimum python test
yatarkan Feb 25, 2026
a7b1597
Fix deepstack vision inputs for chunked prefill case in CB
yatarkan Feb 27, 2026
e84d588
Deep copy lm_extra_inputs tensors to avoid stale references in CB loop
yatarkan Feb 27, 2026
a3f767e
Merge branch 'master' into yt/qwen3-vl
yatarkan Feb 27, 2026
043fd7a
Add video processor config
yatarkan Mar 5, 2026
17af948
Add video processor config to vision encoder, extend encoded video st…
yatarkan Mar 5, 2026
c9331bf
Extract qwen2vl logic into methods for reusage
yatarkan Mar 5, 2026
e04f95a
Handle qwen3vl video processing with metadata, override and reuse qwe…
yatarkan Mar 5, 2026
0f6e269
Merge branch 'master' into yt/qwen3-vl
yatarkan Mar 5, 2026
7345d1a
Add Qwen3-VL to supported models in docs
yatarkan Mar 5, 2026
07b37c8
Add qwen3-vl to llm_bench visual_text_gen use case
yatarkan Mar 5, 2026
ff91f93
Fix trailing comma
yatarkan Mar 5, 2026
611033c
Fix typo in var name
yatarkan Mar 5, 2026
95d5e53
Fix copilot review comments
yatarkan Mar 5, 2026
d12544b
Pass video metadata struct to calculate_timestamps function, fix revi…
yatarkan Mar 5, 2026
fb28c75
Safe get prompt_ids optional tensor in CB
yatarkan Mar 6, 2026
2c1de78
Add lm extra inputs to add_request with video inputs
yatarkan Mar 6, 2026
3cc9ac0
Fix copyright year
yatarkan Mar 6, 2026
88d36a5
Move deep copy tensors map to utility function
yatarkan Mar 6, 2026
3f1f74b
Make variables const
yatarkan Mar 9, 2026
e0e1253
Remove ununsed var
yatarkan Mar 9, 2026
1393641
Move deepstack data aggregation and filling tensor to struct with fun…
yatarkan Mar 9, 2026
2b08b71
Add default initializers for DeepstackGroupData, remove unused var
yatarkan Mar 10, 2026
68a4973
Change return type to const ref
yatarkan Mar 10, 2026
a1202f9
Add qwen3-vl image and video tags to docs
yatarkan Mar 10, 2026
ee2ed7b
Fix name
yatarkan Mar 10, 2026
e389293
Pin optimum-intel commit
yatarkan Mar 10, 2026
acb70db
Fix docstring
yatarkan Mar 10, 2026
831244c
Fix review comment
yatarkan Mar 10, 2026
fe6ade2
Move qwen3vl utils to .cpp anonymous namespace, change fps type to fl…
yatarkan Mar 10, 2026
8a9d7b6
Fix default value
yatarkan Mar 10, 2026
0bfacd4
Add comment for video processor config fps
yatarkan Mar 11, 2026
3152801
Add qwen3-vl tests skip for resolutions corner cases
yatarkan Mar 11, 2026
05933bf
Merge branch 'master' into yt/qwen3-vl
yatarkan Mar 11, 2026
0caa12b
Remove unused original_frames_num from video metadata
yatarkan Mar 11, 2026
fc7bc6e
Merge branch 'master' into yt/qwen3-vl
yatarkan Mar 11, 2026
388e4f9
Merge branch 'master' into yt/qwen3-vl
yatarkan Mar 12, 2026
2f1a38b
Unskip qwen3-vl PA test cases
yatarkan Mar 12, 2026
daaa67a
Align lm extra inputs map copy for add_request API
yatarkan Mar 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,12 @@ jobs:
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
- name: 'VLM (qwen3-vl)'
cmd: |
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
defaults:
run:
shell: bash
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/manylinux_2_28.yml
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,12 @@ jobs:
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
- name: 'VLM (qwen3-vl)'
cmd: |
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
defaults:
run:
shell: bash
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -718,6 +718,12 @@ jobs:
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
- name: 'VLM (qwen3-vl)'
cmd: |
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
timeout: 60
defaults:
run:
shell: pwsh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ export const VLM_MODELS: VLMModelType[] = [
{
name: 'nanoLLaVA',
links: ['https://huggingface.co/qnguyen3/nanoLLaVA'],
notesLink: '#nanollava-notes',
},
{
name: 'nanoLLaVA-1.5',
Expand Down Expand Up @@ -148,6 +149,25 @@ export const VLM_MODELS: VLMModelType[] = [
},
],
},
{
architecture: 'Qwen3-VL',
models: [
{
name: 'Qwen3-VL',
links: [
'https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct',
'https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking',
'https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct',
'https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking',
'https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct',
'https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking',
'https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct',
'https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking',
],
notesLink: '#qwen3_vl-notes',
},
],
},
{
architecture: 'Gemma3ForConditionalGeneration',
models: [
Expand Down
8 changes: 8 additions & 0 deletions site/docs/supported-models/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,14 @@ generation_config.set_eos_token_id(pipe.get_tokenizer().get_eos_token_id())
#### phi4mm {#phi4mm-notes}

Apply https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/78/files to fix the model export for `transformers>=4.50`

#### Qwen3-VL {#qwen3_vl-notes}

The model requires `transformers>=4.57` for the export with `optimum-cli`.

#### nanoLLaVA {#nanollava-notes}

The model requires `transformers>=4.48` for the export with `optimum-cli`.
:::

## Speech Recognition Models (Whisper-based)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The prompt can contain `<ov_genai_image_i>` with `i` replaced with an actual zer
1. InternVL2: `<image>\n`
2. llava-1.5-7b-hf: `<image>`
3. LLaVA-NeXT: `<image>`
4. LLaVa-NeXT-Video: `<image>`
4. LLaVA-NeXT-Video: `<image>`
5. nanoLLaVA: `<image>\n`
6. nanoLLaVA-1.5: `<image>\n`
7. MiniCPM-o-2_6: `<image>./</image>\n`
Expand All @@ -24,12 +24,14 @@ The prompt can contain `<ov_genai_image_i>` with `i` replaced with an actual zer
10. Phi-4-multimodal-instruct: `<|image_i|>\n` - the index starts with one
11. Qwen2-VL: `<|vision_start|><|image_pad|><|vision_end|>`
12. Qwen2.5-VL: `<|vision_start|><|image_pad|><|vision_end|>`
13. gemma-3-4b-it: `<start_of_image>`
13. Qwen3-VL: `<|vision_start|><|image_pad|><|vision_end|>`
14. gemma-3-4b-it: `<start_of_image>`

Model's native video tag can be used to refer to a video. These tags are:
1. LLaVa-NeXT-Video: `<video>`
1. LLaVA-NeXT-Video: `<video>`
2. Qwen2-VL: `<|vision_start|><|video_pad|><|vision_end|>`
2. Qwen2.5-VL: `<|vision_start|><|video_pad|><|vision_end|>`
3. Qwen2.5-VL: `<|vision_start|><|video_pad|><|vision_end|>`
4. Qwen3-VL: `<|vision_start|><|video_pad|><|vision_end|>`

If the prompt doesn't contain image or video tags, but images or videos are provided, the tags are prepended to the prompt.

Expand Down
Loading
Loading