Skip to content

[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU#3220

Merged
AlexanderKalistratov merged 8 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npu_vlm_auto
Feb 6, 2026
Merged

[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU#3220
AlexanderKalistratov merged 8 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npu_vlm_auto

Conversation

@AlexanderKalistratov
Copy link
Contributor

Description

  • Using AUTO:GPU,CPU for embedding models instead of CPU.
  • Reduces TTFT for 60-80%.
  • NPU plugin doesn't support RemoteTensor so avoid passing it
  • Updating AUTO device properties with default values instead of hardcoding it. It makes device properties configurable by user

EISW-195191

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

@github-actions github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) category: GGUF GGUF file reader labels Jan 23, 2026
@Wovchena Wovchena requested a review from Copilot January 26, 2026 08:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes VLM (Vision-Language Model) performance on NPU by switching embedding models from CPU to AUTO plugin (GPU/CPU fallback), reducing TTFT by 60-80%. It also addresses NPU's lack of RemoteTensor support and makes AUTO device properties user-configurable.

Changes:

  • Changed embedder device from "CPU" to "AUTO" for NPU execution with GPU/CPU fallback priorities
  • Added use_intermediate_remote_tensor parameter to disable RemoteTensor usage for NPU
  • Refactored device properties handling to use user-configurable AUTO plugin settings instead of hardcoded values

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/python_tests/test_vlm_pipeline.py Added NPU AUTO config test and formatting cleanup
src/cpp/src/visual_language/pipeline.cpp Switched embedder device to AUTO for NPU, added AUTO properties configuration, and propagated use_intermediate_remote_tensor flag
src/cpp/src/lm_encoding.hpp Added use_intermediate_remote_tensor parameter to function signature
src/cpp/src/lm_encoding.cpp Replaced hardcoded return_remote_tensor with configurable use_intermediate_remote_tensor parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +43 to +47
auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_propeties;
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'propeties' to 'properties'.

Suggested change
auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
device_properties["AUTO"] = auto_propeties;
auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_properties.insert(ov::device::priorities("GPU", "CPU"));
auto_properties.insert(ov::intel_auto::enable_startup_fallback(false));
device_properties["AUTO"] = auto_properties;

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +47
auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

device_properties["AUTO"] = auto_propeties;
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'auto_propeties' to 'auto_properties'.

Suggested change
auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
device_properties["AUTO"] = auto_propeties;
auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_properties.insert(ov::device::priorities("GPU", "CPU"));
auto_properties.insert(ov::intel_auto::enable_startup_fallback(false));
device_properties["AUTO"] = auto_properties;

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));

Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device priorities are hardcoded as 'GPU', 'CPU'. Consider making these configurable through parameters or allowing users to override them via device_properties to maintain consistency with the PR's goal of making properties user-configurable.

Suggested change
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
// Apply default device priorities only if the user did not specify them.
const std::string priorities_name = ov::device::priorities.name();
if (auto_propeties.find(priorities_name) == auto_propeties.end() &&
auto_propeties.find("DEVICE_PRIORITIES") == auto_propeties.end()) {
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
}
// Apply default startup fallback only if the user did not specify it.
const std::string startup_fallback_name = ov::intel_auto::enable_startup_fallback.name();
if (auto_propeties.find(startup_fallback_name) == auto_propeties.end()) {
auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false));
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert doesn't modify if key already present

@AlexanderKalistratov AlexanderKalistratov changed the title [NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU [NPUW][VLM][AUTO][DO NIT MERGE] Using Auto plugin for VLMs embeddings when running on NPU Jan 26, 2026
@AlexanderKalistratov AlexanderKalistratov changed the title [NPUW][VLM][AUTO][DO NIT MERGE] Using Auto plugin for VLMs embeddings when running on NPU [DO NOT MERGE][NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU Jan 26, 2026
@AlexanderKalistratov
Copy link
Contributor Author

Please do not merge this PR yet as we need to asses how it is affect energy consumption

@Wovchena Wovchena marked this pull request as draft January 26, 2026 10:09
@dmatveev dmatveev added this to the 2026.1 milestone Feb 4, 2026

void npu_auto_default_properties(ov::AnyMap& device_properties) {
auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {});
auto_propeties.insert(ov::device::priorities("GPU", "CPU"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep CPU by default but give a way (and document it) on how to put GPU in the config.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config isn't there yet. OVMS are working on fully exposing vision models

@AlexanderKalistratov AlexanderKalistratov changed the title [DO NOT MERGE][NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU [NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU Feb 5, 2026
@AlexanderKalistratov AlexanderKalistratov marked this pull request as ready for review February 5, 2026 13:30
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk lgtm imo thx!

@AlexanderKalistratov AlexanderKalistratov added this pull request to the merge queue Feb 6, 2026
Merged via the queue into openvinotoolkit:master with commit b48f917 Feb 6, 2026
153 of 155 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GGUF GGUF file reader category: LLM LLM pipeline (stateful, static) category: visual language Visual language pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants