Skip to content

Commit 9b3fc10

Browse files
committed
Merge remote-tracking branch 'origin/master_tr_module_genai' into xp/merge_master
# Conflicts: # src/cpp/CMakeLists.txt # src/cpp/src/gguf_utils/rtn_quantize.hpp # src/cpp/src/modeling/models/qwen3_vl/processing_qwen3_vl.cpp # src/cpp/src/modeling/models/qwen3_vl/processing_qwen3_vl.hpp # src/cpp/src/modeling/ops/ops.cpp # src/cpp/src/modeling/weights/quantization_selector.cpp # src/cpp/src/modeling/weights/quantization_selector.hpp # src/cpp/src/visual_language/qwen2vl/classes.hpp # src/cpp/src/visual_language/qwen3_vl/classes.cpp # src/cpp/src/visual_language/qwen3_vl/classes.hpp # src/cpp/src/visual_language/vlm_config.cpp # src/cpp/src/visual_language/vlm_config.hpp
2 parents 9eb626e + 1d39d0f commit 9b3fc10

File tree

149 files changed

+25474
-418
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+25474
-418
lines changed

.github/workflows/linux.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -620,7 +620,7 @@ jobs:
620620
timeout: 90
621621
- name: 'WWB tests (nanollava)'
622622
cmd: |
623-
python -m pip install transformers==4.48.0
623+
python -m pip install transformers==4.48.0 diffusers==0.35.2
624624
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
625625
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
626626
timeout: 90
@@ -630,6 +630,12 @@ jobs:
630630
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
631631
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
632632
timeout: 60
633+
- name: 'VLM (qwen3-vl)'
634+
cmd: |
635+
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
636+
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
637+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
638+
timeout: 60
633639
defaults:
634640
run:
635641
shell: bash

.github/workflows/mac.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -473,7 +473,7 @@ jobs:
473473
timeout: 120
474474
- name: 'WWB tests (nanollava)'
475475
cmd: |
476-
python -m pip install transformers==4.48.0
476+
python -m pip install transformers==4.48.0 diffusers==0.35.2
477477
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
478478
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
479479
timeout: 90

.github/workflows/manylinux_2_28.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ jobs:
543543
timeout: 90
544544
- name: 'WWB tests (nanollava)'
545545
cmd: |
546-
python -m pip install transformers==4.48.0
546+
python -m pip install transformers==4.48.0 diffusers==0.35.2
547547
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
548548
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
549549
timeout: 90
@@ -553,6 +553,12 @@ jobs:
553553
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
554554
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
555555
timeout: 60
556+
- name: 'VLM (qwen3-vl)'
557+
cmd: |
558+
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
559+
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
560+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
561+
timeout: 60
556562
defaults:
557563
run:
558564
shell: bash

.github/workflows/windows.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -708,7 +708,7 @@ jobs:
708708
timeout: 90
709709
- name: 'WWB tests (nanollava)'
710710
cmd: |
711-
python -m pip install transformers==4.48.0
711+
python -m pip install transformers==4.48.0 diffusers==0.35.2
712712
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
713713
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
714714
timeout: 90
@@ -718,6 +718,12 @@ jobs:
718718
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "MiniCPM-o-2_6"
719719
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
720720
timeout: 60
721+
- name: 'VLM (qwen3-vl)'
722+
cmd: |
723+
python -m pip install transformers==4.57.0 git+https://github.com/huggingface/optimum-intel.git@0566b76f094d4c3084e06d29a248b39a1bff3fa4
724+
python -m pytest -s -v tests/python_tests/test_vlm_pipeline.py --override-ini cache_dir=/mount/caches/pytest/ -k "qwen3-vl"
725+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).visual_language.test }}
726+
timeout: 60
721727
defaults:
722728
run:
723729
shell: pwsh

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Library efficiently supports LoRA adapters for Text and Image generation scenari
7373
- Select active adapters for every generation
7474
- Mix multiple adapters with coefficients via alpha blending
7575

76-
All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2025/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix.
76+
All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2026/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix.
7777

7878
<a id="optimization-methods"></a>
7979

@@ -87,12 +87,12 @@ OpenVINO™ GenAI library provides a transparent way to use state-of-the-art gen
8787
Additionally, OpenVINO™ GenAI library implements a continuous batching approach to use OpenVINO within LLM serving. The continuous batching library could be used in LLM serving frameworks and supports the following features:
8888
- Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query.
8989

90-
Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html) for more details.
90+
Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2026/model-server/ovms_what_is_openvino_model_server.html) for more details.
9191

9292

9393
## Additional Resources
9494

95-
- [OpenVINO Generative AI workflow](https://docs.openvino.ai/2025/openvino-workflow-generative.html)
95+
- [OpenVINO Generative AI workflow](https://docs.openvino.ai/2026/openvino-workflow-generative.html)
9696
- [Optimum Intel and OpenVINO](https://huggingface.co/docs/optimum/intel/openvino/export)
9797
- [OpenVINO Notebooks with GenAI](https://openvinotoolkit.github.io/openvino_notebooks/?libraries=OpenVINO+GenAI)
9898

cmake/features.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ option(ENABLE_SAMPLES "Enable samples build" ON)
99
option(ENABLE_TESTS "Enable tests build" ON)
1010
option(ENABLE_TOOLS "Enable tools build" ON)
1111
option(ENABLE_GGUF "Enable support for GGUF format" ON)
12+
option(ENABLE_SAFETENSORS "Enable support for Safetensors format" ON)
1213
option(ENABLE_XGRAMMAR "Enable support for structured output generation with xgrammar backend" ON)
1314
option(ENABLE_DYNAMIC_WEIGHT_MANAGEMENT "Enable offloading model weights (load/release)" OFF)
1415
option(ENABLE_OPENVINO_NEW_ARCH "Enable OpenVINO new architecture for QWen3.5 etc models support" OFF)

samples/cpp/image_generation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task sta
4242

4343
## Run text to image
4444

45-
Follow [Get Started with Samples](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/get-started-demos.html) to run the sample.
45+
Follow [Get Started with Samples](https://docs.openvino.ai/2026/get-started/learn-openvino/openvino-samples/get-started-demos.html) to run the sample.
4646

4747
`stable_diffusion ./dreamlike_anime_1_0_ov/FP16 'cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting'`
4848

samples/cpp/rag/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ optimum-cli export openvino --task text-classification --model cross-encoder/ms-
2727

2828
## Run
2929

30-
Follow [Get Started with Samples](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/get-started-demos.html) to run the sample.
30+
Follow [Get Started with Samples](https://docs.openvino.ai/2026/get-started/learn-openvino/openvino-samples/get-started-demos.html) to run the sample.
3131

3232
### 1. Text Embedding Sample (`text_embeddings.cpp`)
3333
- **Description:**

samples/cpp/speech_generation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ python create_speaker_embedding.py
3838

3939
## Run Text-to-speech sample
4040

41-
Follow [Get Started with Samples](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/get-started-demos.html)
41+
Follow [Get Started with Samples](https://docs.openvino.ai/2026/get-started/learn-openvino/openvino-samples/get-started-demos.html)
4242
to run the sample.
4343

4444
`text-to-speech speecht5_tts "Hello OpenVINO GenAI" speaker_embedding.bin`

samples/cpp/text_generation/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ and architectures, we still recommend converting the model to the IR format usin
3232

3333
## Sample Descriptions
3434
### Common information
35-
Follow [Get Started with Samples](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.
35+
Follow [Get Started with Samples](https://docs.openvino.ai/2026/get-started/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.
3636
Follow [build instruction](../../../src/docs/BUILD.md) to build GenAI samples
3737

3838
GPUs usually provide better performance compared to CPUs. Modify the source code to change the device for inference to the GPU.
@@ -64,7 +64,7 @@ The following template can be used as a default, but it may not work properly wi
6464
#### NPU support
6565

6666
NPU device is supported with some limitations. See [NPU inference of
67-
LLMs](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html) documentation. In particular:
67+
LLMs](https://docs.openvino.ai/2026/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html) documentation. In particular:
6868

6969
- Models must be exported with symmetric INT4 quantization (`optimum-cli export openvino --weight-format int4 --sym --model <model> <output_folder>`).
7070
For models with more than 4B parameters, channel wise quantization should be used (`--group-size -1`).

0 commit comments

Comments
 (0)