[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU by AlexanderKalistratov · Pull Request #3220 · openvinotoolkit/openvino.genai

AlexanderKalistratov · 2026-01-23T00:27:44Z

Description

Using AUTO:GPU,CPU for embedding models instead of CPU.
Reduces TTFT for 60-80%.
NPU plugin doesn't support RemoteTensor so avoid passing it
Updating AUTO device properties with default values instead of hardcoding it. It makes device properties configurable by user

EISW-195191

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR optimizes VLM (Vision-Language Model) performance on NPU by switching embedding models from CPU to AUTO plugin (GPU/CPU fallback), reducing TTFT by 60-80%. It also addresses NPU's lack of RemoteTensor support and makes AUTO device properties user-configurable.

Changes:

Changed embedder device from "CPU" to "AUTO" for NPU execution with GPU/CPU fallback priorities
Added use_intermediate_remote_tensor parameter to disable RemoteTensor usage for NPU
Refactored device properties handling to use user-configurable AUTO plugin settings instead of hardcoded values

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
tests/python_tests/test_vlm_pipeline.py	Added NPU AUTO config test and formatting cleanup
src/cpp/src/visual_language/pipeline.cpp	Switched embedder device to AUTO for NPU, added AUTO properties configuration, and propagated `use_intermediate_remote_tensor` flag
src/cpp/src/lm_encoding.hpp	Added `use_intermediate_remote_tensor` parameter to function signature
src/cpp/src/lm_encoding.cpp	Replaced hardcoded `return_remote_tensor` with configurable `use_intermediate_remote_tensor` parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-26T08:41:01Z

src/cpp/src/visual_language/pipeline.cpp

+    auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
+    auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
+    auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
+
+    device_properties["AUTO"] = auto_propeties;


Corrected spelling of 'propeties' to 'properties'.

Suggested change

auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});

auto_propeties.insert(ov::device::priorities("GPU", "CPU"));

auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_propeties;

auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});

auto_properties.insert(ov::device::priorities("GPU", "CPU"));

auto_properties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_properties;

Copilot · 2026-01-26T08:41:01Z

src/cpp/src/visual_language/pipeline.cpp

+    auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
+    auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
+    auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
+
+    device_properties["AUTO"] = auto_propeties;


Corrected spelling of 'auto_propeties' to 'auto_properties'.

Suggested change

auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});

auto_propeties.insert(ov::device::priorities("GPU", "CPU"));

auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_propeties;

auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});

auto_properties.insert(ov::device::priorities("GPU", "CPU"));

auto_properties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_properties;

Copilot · 2026-01-26T08:41:02Z

src/cpp/src/visual_language/pipeline.cpp

+    auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
+    auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
+


The device priorities are hardcoded as 'GPU', 'CPU'. Consider making these configurable through parameters or allowing users to override them via device_properties to maintain consistency with the PR's goal of making properties user-configurable.

Suggested change

auto_propeties.insert(ov::device::priorities("GPU", "CPU"));

auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

// Apply default device priorities only if the user did not specify them.

const std::string priorities_name = ov::device::priorities.name();

if (auto_propeties.find(priorities_name) == auto_propeties.end() &&

auto_propeties.find("DEVICE_PRIORITIES") == auto_propeties.end()) {

auto_propeties.insert(ov::device::priorities("GPU", "CPU"));

}

// Apply default startup fallback only if the user did not specify it.

const std::string startup_fallback_name = ov::intel_auto::enable_startup_fallback.name();

if (auto_propeties.find(startup_fallback_name) == auto_propeties.end()) {

auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

}

insert doesn't modify if key already present

AlexanderKalistratov · 2026-01-26T10:06:30Z

Please do not merge this PR yet as we need to asses how it is affect energy consumption

dmatveev · 2026-02-04T10:55:43Z

src/cpp/src/visual_language/pipeline.cpp

+
+void npu_auto_default_properties(ov::AnyMap& device_properties) {
+    auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
+    auto_propeties.insert(ov::device::priorities("GPU", "CPU"));


Lets keep CPU by default but give a way (and document it) on how to put GPU in the config.

The config isn't there yet. OVMS are working on fully exposing vision models

dmatveev

idk lgtm imo thx!

AlexanderKalistratov added 3 commits January 22, 2026 19:59

[NPUW][AUTO] Using AUTO plugin for vlms embedding when running on NPU

ecf7348

Fix default config & add test

05b37ff

Merge branch 'master' into npu_vlm_auto

f7586ec

AlexanderKalistratov requested review from Wovchena, as-suvorov, sgonorov and yatarkan as code owners January 23, 2026 00:27

github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) category: GGUF GGUF file reader labels Jan 23, 2026

linter

f92c507

AlexanderKalistratov added the do_not_merge label Jan 23, 2026

Wovchena approved these changes Jan 26, 2026

View reviewed changes

Wovchena requested a review from Copilot January 26, 2026 08:40

Copilot AI reviewed Jan 26, 2026

View reviewed changes

AlexanderKalistratov changed the title ~~[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU~~ [NPUW][VLM][AUTO][DO NIT MERGE] Using Auto plugin for VLMs embeddings when running on NPU Jan 26, 2026

AlexanderKalistratov changed the title ~~[NPUW][VLM][AUTO][DO NIT MERGE] Using Auto plugin for VLMs embeddings when running on NPU~~ [DO NOT MERGE][NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU Jan 26, 2026

Wovchena marked this pull request as draft January 26, 2026 10:09

dmatveev added this to the 2026.1 milestone Feb 4, 2026

dmatveev reviewed Feb 4, 2026

View reviewed changes

AlexanderKalistratov added 2 commits February 5, 2026 10:11

Merge branch 'master' into npu_vlm_auto

ec41488

Set AUTO:CPU as default device for embeddings

36c544f

AlexanderKalistratov removed the do_not_merge label Feb 5, 2026

AlexanderKalistratov changed the title ~~[DO NOT MERGE][NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU~~ [NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU Feb 5, 2026

AlexanderKalistratov marked this pull request as ready for review February 5, 2026 13:30

AlexanderKalistratov added 2 commits February 5, 2026 20:40

Fix mistypings

9be2aaf

Merge branch 'master' into npu_vlm_auto

82705a0

dmatveev approved these changes Feb 5, 2026

View reviewed changes

AlexanderKalistratov added this pull request to the merge queue Feb 6, 2026

Merged via the queue into openvinotoolkit:master with commit b48f917 Feb 6, 2026
153 of 155 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU#3220

[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU#3220
AlexanderKalistratov merged 8 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npu_vlm_auto

AlexanderKalistratov commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

AlexanderKalistratov Jan 26, 2026

Uh oh!

AlexanderKalistratov commented Jan 26, 2026

Uh oh!

dmatveev Feb 4, 2026

Uh oh!

Wovchena Feb 4, 2026

Uh oh!

dmatveev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
		auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

-    auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
-    auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
+    // Apply default device priorities only if the user did not specify them.
+    const std::string priorities_name = ov::device::priorities.name();
+    if (auto_propeties.find(priorities_name) == auto_propeties.end() &&
+        auto_propeties.find("DEVICE_PRIORITIES") == auto_propeties.end()) {
+        auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
+    }
+    // Apply default startup fallback only if the user did not specify it.
+    const std::string startup_fallback_name = ov::intel_auto::enable_startup_fallback.name();
+    if (auto_propeties.find(startup_fallback_name) == auto_propeties.end()) {
+        auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
+    }

Conversation

AlexanderKalistratov commented Jan 23, 2026

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

AlexanderKalistratov Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

AlexanderKalistratov commented Jan 26, 2026

Uh oh!

dmatveev Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Wovchena Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

dmatveev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants