[`NPU`][`ZeroInferRequest`][`set_tensor`] Provide precision hint for allocation of `ZeroTensor` by MirceaDan99 · Pull Request #34264 · openvinotoolkit/openvino

MirceaDan99 · 2026-02-23T14:34:52Z

Details:

~~Add optional ov::element::Type parameter for ZeroInferRequest::allocate_tensor method~~
Add new ov::Tensor::set_element_type method to cover boolean-u8 precision mismatches between user tensors and precisions from compiler descriptors
ZeroTensors can now be allocated during set_tensor/set_tensors methods even with PV driver (when zeMutableCommandListExtVersion is less than 1.0) having their precision updated accordingly if needed (special case for ov::element::boolean->ov::element::u8)
For tests:
- *Added BooleanPrecisionInferRequestRunTests meant to be compatible with PV driver, but skipped due to ELF loader from PV driver not being able to parse boolean inputs from blobs
- *Added ZeroInferRequestTests that create ZeroInferRequest locally using different ZeroInitStructs (reinterpret casted from ZeroInitMock)
- Changed ZeroInitMock object to accept all of the extension parameters respecting this order:
  - zeDriverNpuExtVersion
  - zeGraphNpuExtVersion - in the past, ZeroInitMock permitted overwritting of only this param!
  - zeCommandQueueNpuExtVersion
  - zeProfilingNpuExtVersion
  - zeContextNpuExtVersion
  - zeMutableCommandListExtVersion
  - zeExternalMemMapSysMemExtVersion

Tickets:

C181730

AI Assistance:

AI assistance used: no / ~~yes~~
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

pereanub

Please add tests.

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

Copilot

Pull request overview

Adds an optional precision hint when allocating ZeroTensor in the NPU Level Zero backend, allowing allocations to match the user-provided tensor element type (useful for cases like ov::element::boolean with older compiler/driver behaviors).

Changes:

Extend ZeroInferRequest::allocate_tensor with an optional ov::element::Type precision parameter.
Pass the user tensor element type into allocate_tensor in set_tensor / set_tensors fallback allocation paths.
Use the hinted precision for check_network_precision() and ZeroTensor construction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp`	Threads user tensor element type through fallback allocation to guide `ZeroTensor` precision.
`src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp`	Updates `allocate_tensor` signature and documents the new optional precision parameter.

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

src/plugins/intel_npu/tests/functional/behavior/ov_infer_request/infer_request_run.hpp

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp

pereanub

What is happening if receiving an internal tensor with U8 first and BOOL after that for the same infer?

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

MirceaDan99 · 2026-03-02T09:40:55Z

What is happening if receiving an internal tensor with U8 first and BOOL after that for the same infer?

@pereanub,
Isn't the mentioned scenario already covered in tests?

input0 -> importMemoryBatchedTensorU8 input1 -> unalignedBatchedTensorU8U8 ... input0 -> importMemoryBatchedTensorBoolean input1 -> unalignedBatchedTensorBoolean

pereanub · 2026-03-02T09:45:31Z

Oh, very hard to follow that test, please create different func test for each case.

MirceaDan99 · 2026-03-02T12:00:35Z

Oh, very hard to follow that test, please create different func test for each case.

@pereanub
I believe separating these test would not bring any big advantage for code reading as the preparation for the test consumes more lines of code than the actual set_tensor/s->infer functions that need to be tested.

Refactored the test in 905ae73 to be more modular by defining lambda helpers:

allocate_tensors
set_tensor_and_infer
set_tensors_and_infer
deallocate_addresses (for unaligned memory scenarios)

and the actual tested lines to be:

        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), std::vector<ov::Tensor>{importMemoryTensorU8_1, unalignedTensorU8_2}));
        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), std::vector<ov::Tensor>{unalignedTensorBoolean_2, importMemoryTensorBoolean_1}));

        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), importMemoryBatchedTensorU8));
        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), unalignedBatchedTensorU8));

        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), importMemoryBatchedTensorBoolean));
        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), unalignedBatchedTensorBoolean));

        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), std::vector<ov::Tensor>{unalignedTensorU8_1, importMemoryTensorU8_2}));
        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), std::vector<ov::Tensor>{importMemoryTensorBoolean_2, unalignedTensorBoolean_1}));

…g not existing batched tensor for PV driver

… requests for PV drivers

… Zero Tensor`

…f compiler support for `Boolean LessEq Op`

…r(set_tensor/s)` methods

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Copilot · 2026-03-11T08:31:41Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

+    const bool isMutableCommandListSupported = _initStructs->getMutableCommandListExtVersion() >= ZE_MAKE_VERSION(1, 0);
+    if (isMutableCommandListSupported && batchSizeCandidate.has_value()) {
+        get_level_zero_inputs(foundPort.idx).resize(tensors.size());
+
+        for (size_t i = 0; i < tensors.size(); i++) {
+            try {
+                _logger.debug("ZeroInferRequest::set_tensors - create zero tensor");
+                OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "create zero tensor");
+                get_level_zero_input(foundPort.idx, i) = std::make_shared<ZeroTensor>(_initStructs, tensors.at(i));
+            } catch (const ZeroMemException& exception) {
+                _logger.debug(
+                    "ZeroInferRequest::set_tensors - exception caught while trying to create a Level Zero tensor "
+                    "from the user tensor: %s",
+                    exception.what());
+
                _logger.debug("ZeroInferRequest::set_tensors - allocate locally L0 tensor");
                OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "allocate tensor");
+                get_level_zero_input(foundPort.idx, i) = allocate_tensor(foundPort.idx, INPUT, batchSizeCandidate);
+            }


The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below.

Copilot · 2026-03-11T08:31:41Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

+    } else {
+        auto& levelZeroTensor = get_level_zero_input(foundPort.idx);


The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below

Copilot · 2026-03-11T08:31:41Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

+        const auto& userTensorElementType = tensors.at(SINGLE_TENSOR)->get_element_type();
+        if (userTensorElementType == ov::element::boolean && levelZeroTensor->get_element_type() == ov::element::u8) {
+            levelZeroTensor->set_element_type(userTensorElementType);
+        }
    }


The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

Precision for the rest of the tensors is already asserted in src/plugins/intel_npu/src/common/src/sync_infer_request.cpp#L310 (check @SyncInferRequest::check_batched_tensors method)

Copilot · 2026-03-11T08:31:41Z

src/plugins/intel_npu/tests/functional/internal/compiler_adapter/zero_init_mock.hpp

+namespace {
+constexpr uint32_t TARGET_ZE_DRIVER_NPU_EXT_VERSION = ZE_DRIVER_NPU_EXT_VERSION_1_0;
+constexpr uint32_t TARGET_ZE_GRAPH_NPU_EXT_VERSION = ZE_GRAPH_EXT_VERSION_1_16;
+constexpr uint32_t TARGET_ZE_COMMAND_QUEUE_NPU_EXT_VERSION = ZE_COMMAND_QUEUE_NPU_EXT_VERSION_1_1;
+constexpr uint32_t TARGET_ZE_PROFILING_NPU_EXT_VERSION = ZE_PROFILING_DATA_EXT_VERSION_1_0;
+constexpr uint32_t TARGET_ZE_CONTEXT_NPU_EXT_VERSION = ZE_CONTEXT_NPU_EXT_VERSION_1_0;
+constexpr uint32_t TARGET_ZE_MUTABLE_COMMAND_LIST_EXT_VERSION = ZE_MUTABLE_COMMAND_LIST_EXP_VERSION_1_1;
+constexpr uint32_t TARGET_ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION = ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION_1_0;
+}  // namespace


Putting an anonymous namespace in a header creates a separate set of internal-linkage constants per translation unit, which is generally discouraged and can make reuse/visibility more confusing. Prefer inline constexpr constants in a named namespace (e.g., intel_npu::test or similar), or make them static constexpr members of ZeroInitStructsMock (or a dedicated traits struct) to keep the defaults well-scoped and consistently referenced.

Copilot · 2026-03-11T08:31:42Z

src/plugins/intel_npu/src/utils/include/intel_npu/utils/zero/zero_tensor.hpp

@@ -55,6 +55,8 @@ class ZeroTensor final : public ov::ITensor {

    const ov::element::Type& get_element_type() const override;



set_element_type reads like a general-purpose mutator, but the implementation asserts it is only valid for the boolean/u8 special-case. To avoid accidental misuse, consider either (a) renaming it to reflect the narrow intent (e.g., boolean/u8 override), and/or (b) documenting the strict preconditions in the header comment, and/or (c) limiting exposure (e.g., making it private and granting ZeroInferRequest access).

Suggested change

/**

* @brief Special-purpose override for the tensor element type.

*

* This API is intended only for the narrow boolean/u8 handling case used by

* the Zero inference pipeline. It must not be used as a general-purpose

* element type mutator. The implementation asserts that only the supported

* conversion(s) are requested and may fail if used with other element types.

*/

Copilot · 2026-03-11T08:31:42Z

src/plugins/intel_npu/tests/functional/internal/backend/zero_infer_request_tests.hpp

+    auto allocate_tensors() -> std::tuple</* importMemoryBatched */ ov::Tensor,
+                                          /* importMemoryTensor_1 */ ov::Tensor,
+                                          /* importMemoryTensor_2 */ ov::Tensor,
+                                          /* unalignedBatchedTensor */ ov::Tensor,
+                                          /* unalignedTensor_1 */ ov::Tensor,
+                                          /* unalignedTensor_2 */ ov::Tensor> {
+        auto model_shape = ov_model->get_parameters()[0]->get_shape();
+        ov::Coordinate start_coordinate{model_shape};
+        ov::Coordinate stop_coordinate{model_shape};
+        start_coordinate[0] = 1;
+        stop_coordinate[0] = 2;
+        ov::Allocator alignedAllocator{::intel_npu::utils::AlignedAllocator{::intel_npu::utils::STANDARD_PAGE_SIZE}};
+        ov::Tensor importMemoryBatchedTensor(ov::element::boolean, model_shape, alignedAllocator);
+        ov::Tensor importMemoryTensor_1(importMemoryBatchedTensor, ov::Coordinate{0, 0, 0, 0}, start_coordinate);
+        ov::Tensor importMemoryTensor_2(importMemoryBatchedTensor, ov::Coordinate{1, 0, 0, 0}, stop_coordinate);
+        void* alignedAddr = ::operator new(ov::element::boolean.size() * ov::shape_size(model_shape) + 1,
+                                           std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE));
+        void* unalignedAddr = static_cast<uint8_t*>(alignedAddr) + 1;
+        std::shared_ptr<void> deallocateAddressCallback(alignedAddr, [](void* ptr) {
+            ::operator delete(ptr, std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE));
+        });


The tensor-allocation and set_tensor/set_tensors helper logic appears duplicated between this new internal test suite and BooleanPrecisionInferRequestRunTests in infer_request_run.hpp. Consider extracting these helpers into a shared test utility (or a common base fixture) to reduce duplication and prevent future divergence (especially around the custom aligned/unaligned buffer lifetime handling).

Signature differs for set_tensor_and_infer/set_tensors_and_infer methods:

BooleanPrecisionInferRequestRunTests's expect ov::InferRequest

ZeroInferRequestTests's expect std::shared_ptr<intel_npu::ZeroInferRequest>

MirceaDan99 requested review from a team as code owners February 23, 2026 14:34

github-actions bot added the category: NPU OpenVINO NPU plugin label Feb 23, 2026

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 2 times, most recently from 597b01e to 366e85f Compare February 23, 2026 14:46

pereanub requested changes Feb 24, 2026

View reviewed changes

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp Outdated Show resolved Hide resolved

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp Outdated Show resolved Hide resolved

pereanub requested a review from Copilot February 24, 2026 09:37

Copilot started reviewing on behalf of pereanub February 24, 2026 09:38 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp Outdated Show resolved Hide resolved

src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp Outdated Show resolved Hide resolved

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 7 times, most recently from e09d662 to 653e172 Compare March 2, 2026 07:43

pereanub requested a review from Copilot March 2, 2026 09:22

Copilot started reviewing on behalf of pereanub March 2, 2026 09:23 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

pereanub requested changes Mar 2, 2026

View reviewed changes

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp Outdated Show resolved Hide resolved

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 7 times, most recently from 2b7c51b to 14bbfc6 Compare March 4, 2026 12:50

MirceaDan99 added 12 commits March 11, 2026 07:10

Add special case for batched tensors & update tests

5177489

Update tests to not set NPU_BATCH_MODE for PV driver & fix accessin…

70d158c

…g not existing batched tensor for PV driver

Revert changes in ZeroInferRequest & update tests to recreate infer…

136b55d

… requests for PV drivers

Fix misplaced condition parenthesis

8e987bb

Restrict conditions for tensors only for Boolean User Tensor -> `U8…

57cd7a5

… Zero Tensor`

Drop U8 models for testing boolean tensors & skip tests due to lack o…

23587f3

…f compiler support for `Boolean LessEq Op`

Move ZeroTensors precision update in `update_command_list_for_tenso…

6381932

…r(set_tensor/s)` methods

Re-add precision parameter for allocate_tensor method

922086c

Add ZeroInferRequestTests internal tests

24d3635

Fix new tests

953e919

Fix tests after rebase

e34a555

Refactor changes to use only ZeroTensor::set_element_type method

3f9f5b4

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from a3560d0 to f40ebad Compare March 11, 2026 07:24

Rebase changes with PR openvinotoolkit#34578

0f4d3e5

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 3 times, most recently from c371687 to c804779 Compare March 11, 2026 08:20

pereanub approved these changes Mar 11, 2026

View reviewed changes

pereanub requested a review from Copilot March 11, 2026 08:24

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Refactor changes once again...

a0c4a7e

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from c804779 to a0c4a7e Compare March 11, 2026 08:38

Copilot started reviewing on behalf of pereanub March 11, 2026 08:42 View session

MirceaDan99 added 2 commits March 11, 2026 12:20

Add suggested changes from @Copilot

5f79c55

Forbid import standard allocation for PV drivers

9d2b1e3

MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from d57fecb to 9d2b1e3 Compare March 11, 2026 13:40

MirceaDan99 added this pull request to the merge queue Mar 11, 2026

Merged via the queue into openvinotoolkit:master with commit dfbb82a Mar 11, 2026
216 of 220 checks passed

MirceaDan99 deleted the intel_npu/fix_boolean_datatype_for_set_tenor branch March 11, 2026 20:26

		} else {
		auto& levelZeroTensor = get_level_zero_input(foundPort.idx);

		@@ -55,6 +55,8 @@ class ZeroTensor final : public ov::ITensor {

		const ov::element::Type& get_element_type() const override;

+    /**
+     * @brief Special-purpose override for the tensor element type.
+     *
+     * This API is intended only for the narrow boolean/u8 handling case used by
+     * the Zero inference pipeline. It must not be used as a general-purpose
+     * element type mutator. The implementation asserts that only the supported
+     * conversion(s) are requested and may fail if used with other element types.
+     */

Conversation

MirceaDan99 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

AI Assistance:

Uh oh!

pereanub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pereanub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MirceaDan99 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pereanub commented Mar 2, 2026

Uh oh!

MirceaDan99 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

MirceaDan99 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

MirceaDan99 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

MirceaDan99 Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

MirceaDan99 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

MirceaDan99 commented Feb 23, 2026 •

edited

Loading

MirceaDan99 commented Mar 2, 2026 •

edited

Loading

MirceaDan99 commented Mar 2, 2026 •

edited

Loading

MirceaDan99 Mar 11, 2026 •

edited

Loading