[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU#3220
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes VLM (Vision-Language Model) performance on NPU by switching embedding models from CPU to AUTO plugin (GPU/CPU fallback), reducing TTFT by 60-80%. It also addresses NPU's lack of RemoteTensor support and makes AUTO device properties user-configurable.
Changes:
- Changed embedder device from "CPU" to "AUTO" for NPU execution with GPU/CPU fallback priorities
- Added
use_intermediate_remote_tensorparameter to disable RemoteTensor usage for NPU - Refactored device properties handling to use user-configurable AUTO plugin settings instead of hardcoded values
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/python_tests/test_vlm_pipeline.py | Added NPU AUTO config test and formatting cleanup |
| src/cpp/src/visual_language/pipeline.cpp | Switched embedder device to AUTO for NPU, added AUTO properties configuration, and propagated use_intermediate_remote_tensor flag |
| src/cpp/src/lm_encoding.hpp | Added use_intermediate_remote_tensor parameter to function signature |
| src/cpp/src/lm_encoding.cpp | Replaced hardcoded return_remote_tensor with configurable use_intermediate_remote_tensor parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | ||
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | ||
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | ||
|
|
||
| device_properties["AUTO"] = auto_propeties; |
There was a problem hiding this comment.
Corrected spelling of 'propeties' to 'properties'.
| auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | |
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | |
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| device_properties["AUTO"] = auto_propeties; | |
| auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | |
| auto_properties.insert(ov::device::priorities("GPU", "CPU")); | |
| auto_properties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| device_properties["AUTO"] = auto_properties; |
| auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | ||
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | ||
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | ||
|
|
||
| device_properties["AUTO"] = auto_propeties; |
There was a problem hiding this comment.
Corrected spelling of 'auto_propeties' to 'auto_properties'.
| auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | |
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | |
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| device_properties["AUTO"] = auto_propeties; | |
| auto auto_properties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | |
| auto_properties.insert(ov::device::priorities("GPU", "CPU")); | |
| auto_properties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| device_properties["AUTO"] = auto_properties; |
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | ||
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | ||
|
|
There was a problem hiding this comment.
The device priorities are hardcoded as 'GPU', 'CPU'. Consider making these configurable through parameters or allowing users to override them via device_properties to maintain consistency with the PR's goal of making properties user-configurable.
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | |
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| // Apply default device priorities only if the user did not specify them. | |
| const std::string priorities_name = ov::device::priorities.name(); | |
| if (auto_propeties.find(priorities_name) == auto_propeties.end() && | |
| auto_propeties.find("DEVICE_PRIORITIES") == auto_propeties.end()) { | |
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); | |
| } | |
| // Apply default startup fallback only if the user did not specify it. | |
| const std::string startup_fallback_name = ov::intel_auto::enable_startup_fallback.name(); | |
| if (auto_propeties.find(startup_fallback_name) == auto_propeties.end()) { | |
| auto_propeties.insert(ov::intel_auto::enable_startup_fallback(false)); | |
| } |
There was a problem hiding this comment.
insert doesn't modify if key already present
|
Please do not merge this PR yet as we need to asses how it is affect energy consumption |
|
|
||
| void npu_auto_default_properties(ov::AnyMap& device_properties) { | ||
| auto auto_propeties = utils::pop_or_default<ov::AnyMap>(device_properties, "AUTO", {}); | ||
| auto_propeties.insert(ov::device::priorities("GPU", "CPU")); |
There was a problem hiding this comment.
Lets keep CPU by default but give a way (and document it) on how to put GPU in the config.
There was a problem hiding this comment.
The config isn't there yet. OVMS are working on fully exposing vision models
b48f917
Description
AUTO:GPU,CPUfor embedding models instead ofCPU.RemoteTensorso avoid passing itAUTOdevice properties with default values instead of hardcoding it. It makes device properties configurable by userEISW-195191
Checklist: