Skip to content

[NPU][ZeroInferRequest][set_tensor] Provide precision hint for allocation of ZeroTensor#34264

Merged
MirceaDan99 merged 23 commits intoopenvinotoolkit:masterfrom
MirceaDan99:intel_npu/fix_boolean_datatype_for_set_tenor
Mar 11, 2026
Merged

[NPU][ZeroInferRequest][set_tensor] Provide precision hint for allocation of ZeroTensor#34264
MirceaDan99 merged 23 commits intoopenvinotoolkit:masterfrom
MirceaDan99:intel_npu/fix_boolean_datatype_for_set_tenor

Conversation

@MirceaDan99
Copy link
Contributor

@MirceaDan99 MirceaDan99 commented Feb 23, 2026

Details:

  • Add optional ov::element::Type parameter for ZeroInferRequest::allocate_tensor method
  • Add new ov::Tensor::set_element_type method to cover boolean-u8 precision mismatches between user tensors and precisions from compiler descriptors
  • ZeroTensors can now be allocated during set_tensor/set_tensors methods even with PV driver (when zeMutableCommandListExtVersion is less than 1.0) having their precision updated accordingly if needed (special case for ov::element::boolean->ov::element::u8)
  • For tests:
    • *Added BooleanPrecisionInferRequestRunTests meant to be compatible with PV driver, but skipped due to ELF loader from PV driver not being able to parse boolean inputs from blobs
    • *Added ZeroInferRequestTests that create ZeroInferRequest locally using different ZeroInitStructs (reinterpret casted from ZeroInitMock)
    • Changed ZeroInitMock object to accept all of the extension parameters respecting this order:
      • zeDriverNpuExtVersion
      • zeGraphNpuExtVersion - in the past, ZeroInitMock permitted overwritting of only this param!
      • zeCommandQueueNpuExtVersion
      • zeProfilingNpuExtVersion
      • zeContextNpuExtVersion
      • zeMutableCommandListExtVersion
      • zeExternalMemMapSysMemExtVersion

Tickets:

  • C181730

AI Assistance:

  • AI assistance used: no / yes
  • If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

@MirceaDan99 MirceaDan99 requested review from a team as code owners February 23, 2026 14:34
@github-actions github-actions bot added the category: NPU OpenVINO NPU plugin label Feb 23, 2026
@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 2 times, most recently from 597b01e to 366e85f Compare February 23, 2026 14:46
Copy link
Contributor

@pereanub pereanub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional precision hint when allocating ZeroTensor in the NPU Level Zero backend, allowing allocations to match the user-provided tensor element type (useful for cases like ov::element::boolean with older compiler/driver behaviors).

Changes:

  • Extend ZeroInferRequest::allocate_tensor with an optional ov::element::Type precision parameter.
  • Pass the user tensor element type into allocate_tensor in set_tensor / set_tensors fallback allocation paths.
  • Use the hinted precision for check_network_precision() and ZeroTensor construction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp Threads user tensor element type through fallback allocation to guide ZeroTensor precision.
src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp Updates allocate_tensor signature and documents the new optional precision parameter.

@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 7 times, most recently from e09d662 to 653e172 Compare March 2, 2026 07:43
@pereanub pereanub requested a review from Copilot March 2, 2026 09:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Copy link
Contributor

@pereanub pereanub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening if receiving an internal tensor with U8 first and BOOL after that for the same infer?

@MirceaDan99
Copy link
Contributor Author

MirceaDan99 commented Mar 2, 2026

What is happening if receiving an internal tensor with U8 first and BOOL after that for the same infer?

@pereanub,
Isn't the mentioned scenario already covered in tests?
image

input0 -> importMemoryBatchedTensorU8
input1 -> unalignedBatchedTensorU8U8
...
input0 -> importMemoryBatchedTensorBoolean
input1 -> unalignedBatchedTensorBoolean

@pereanub
Copy link
Contributor

pereanub commented Mar 2, 2026

Oh, very hard to follow that test, please create different func test for each case.

@MirceaDan99
Copy link
Contributor Author

MirceaDan99 commented Mar 2, 2026

Oh, very hard to follow that test, please create different func test for each case.

@pereanub
I believe separating these test would not bring any big advantage for code reading as the preparation for the test consumes more lines of code than the actual set_tensor/s->infer functions that need to be tested.

Refactored the test in 905ae73 to be more modular by defining lambda helpers:

  • allocate_tensors
  • set_tensor_and_infer
  • set_tensors_and_infer
  • deallocate_addresses (for unaligned memory scenarios)

and the actual tested lines to be:

        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), std::vector<ov::Tensor>{importMemoryTensorU8_1, unalignedTensorU8_2}));
        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), std::vector<ov::Tensor>{unalignedTensorBoolean_2, importMemoryTensorBoolean_1}));

        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), importMemoryBatchedTensorU8));
        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), unalignedBatchedTensorU8));

        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), importMemoryBatchedTensorBoolean));
        OV_ASSERT_NO_THROW(set_tensor_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), unalignedBatchedTensorBoolean));

        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ false, compiled_model_u8.input(0), std::vector<ov::Tensor>{unalignedTensorU8_1, importMemoryTensorU8_2}));
        OV_ASSERT_NO_THROW(set_tensors_and_infer(infer_request_u8, compiled_model_u8, /* should_infer = */ true, compiled_model_u8.input(1), std::vector<ov::Tensor>{importMemoryTensorBoolean_2, unalignedTensorBoolean_1}));

@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 7 times, most recently from 2b7c51b to 14bbfc6 Compare March 4, 2026 12:50
@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from a3560d0 to f40ebad Compare March 11, 2026 07:24
@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch 3 times, most recently from c371687 to c804779 Compare March 11, 2026 08:20
@pereanub pereanub requested a review from Copilot March 11, 2026 08:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comment on lines +462 to +480
const bool isMutableCommandListSupported = _initStructs->getMutableCommandListExtVersion() >= ZE_MAKE_VERSION(1, 0);
if (isMutableCommandListSupported && batchSizeCandidate.has_value()) {
get_level_zero_inputs(foundPort.idx).resize(tensors.size());

for (size_t i = 0; i < tensors.size(); i++) {
try {
_logger.debug("ZeroInferRequest::set_tensors - create zero tensor");
OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "create zero tensor");
get_level_zero_input(foundPort.idx, i) = std::make_shared<ZeroTensor>(_initStructs, tensors.at(i));
} catch (const ZeroMemException& exception) {
_logger.debug(
"ZeroInferRequest::set_tensors - exception caught while trying to create a Level Zero tensor "
"from the user tensor: %s",
exception.what());

_logger.debug("ZeroInferRequest::set_tensors - allocate locally L0 tensor");
OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "allocate tensor");
get_level_zero_input(foundPort.idx, i) = allocate_tensor(foundPort.idx, INPUT, batchSizeCandidate);
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below.

Comment on lines +499 to +500
} else {
auto& levelZeroTensor = get_level_zero_input(foundPort.idx);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below

Comment on lines +522 to 526
const auto& userTensorElementType = tensors.at(SINGLE_TENSOR)->get_element_type();
if (userTensorElementType == ov::element::boolean && levelZeroTensor->get_element_type() == ov::element::u8) {
levelZeroTensor->set_element_type(userTensorElementType);
}
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@MirceaDan99 MirceaDan99 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision for the rest of the tensors is already asserted in src/plugins/intel_npu/src/common/src/sync_infer_request.cpp#L310 (check @SyncInferRequest::check_batched_tensors method)

Comment on lines +17 to +25
namespace {
constexpr uint32_t TARGET_ZE_DRIVER_NPU_EXT_VERSION = ZE_DRIVER_NPU_EXT_VERSION_1_0;
constexpr uint32_t TARGET_ZE_GRAPH_NPU_EXT_VERSION = ZE_GRAPH_EXT_VERSION_1_16;
constexpr uint32_t TARGET_ZE_COMMAND_QUEUE_NPU_EXT_VERSION = ZE_COMMAND_QUEUE_NPU_EXT_VERSION_1_1;
constexpr uint32_t TARGET_ZE_PROFILING_NPU_EXT_VERSION = ZE_PROFILING_DATA_EXT_VERSION_1_0;
constexpr uint32_t TARGET_ZE_CONTEXT_NPU_EXT_VERSION = ZE_CONTEXT_NPU_EXT_VERSION_1_0;
constexpr uint32_t TARGET_ZE_MUTABLE_COMMAND_LIST_EXT_VERSION = ZE_MUTABLE_COMMAND_LIST_EXP_VERSION_1_1;
constexpr uint32_t TARGET_ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION = ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION_1_0;
} // namespace
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting an anonymous namespace in a header creates a separate set of internal-linkage constants per translation unit, which is generally discouraged and can make reuse/visibility more confusing. Prefer inline constexpr constants in a named namespace (e.g., intel_npu::test or similar), or make them static constexpr members of ZeroInitStructsMock (or a dedicated traits struct) to keep the defaults well-scoped and consistently referenced.

Copilot uses AI. Check for mistakes.
@@ -55,6 +55,8 @@ class ZeroTensor final : public ov::ITensor {

const ov::element::Type& get_element_type() const override;

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_element_type reads like a general-purpose mutator, but the implementation asserts it is only valid for the boolean/u8 special-case. To avoid accidental misuse, consider either (a) renaming it to reflect the narrow intent (e.g., boolean/u8 override), and/or (b) documenting the strict preconditions in the header comment, and/or (c) limiting exposure (e.g., making it private and granting ZeroInferRequest access).

Suggested change
/**
* @brief Special-purpose override for the tensor element type.
*
* This API is intended only for the narrow boolean/u8 handling case used by
* the Zero inference pipeline. It must not be used as a general-purpose
* element type mutator. The implementation asserts that only the supported
* conversion(s) are requested and may fail if used with other element types.
*/

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +119
auto allocate_tensors() -> std::tuple</* importMemoryBatched */ ov::Tensor,
/* importMemoryTensor_1 */ ov::Tensor,
/* importMemoryTensor_2 */ ov::Tensor,
/* unalignedBatchedTensor */ ov::Tensor,
/* unalignedTensor_1 */ ov::Tensor,
/* unalignedTensor_2 */ ov::Tensor> {
auto model_shape = ov_model->get_parameters()[0]->get_shape();
ov::Coordinate start_coordinate{model_shape};
ov::Coordinate stop_coordinate{model_shape};
start_coordinate[0] = 1;
stop_coordinate[0] = 2;
ov::Allocator alignedAllocator{::intel_npu::utils::AlignedAllocator{::intel_npu::utils::STANDARD_PAGE_SIZE}};
ov::Tensor importMemoryBatchedTensor(ov::element::boolean, model_shape, alignedAllocator);
ov::Tensor importMemoryTensor_1(importMemoryBatchedTensor, ov::Coordinate{0, 0, 0, 0}, start_coordinate);
ov::Tensor importMemoryTensor_2(importMemoryBatchedTensor, ov::Coordinate{1, 0, 0, 0}, stop_coordinate);
void* alignedAddr = ::operator new(ov::element::boolean.size() * ov::shape_size(model_shape) + 1,
std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE));
void* unalignedAddr = static_cast<uint8_t*>(alignedAddr) + 1;
std::shared_ptr<void> deallocateAddressCallback(alignedAddr, [](void* ptr) {
::operator delete(ptr, std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE));
});
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tensor-allocation and set_tensor/set_tensors helper logic appears duplicated between this new internal test suite and BooleanPrecisionInferRequestRunTests in infer_request_run.hpp. Consider extracting these helpers into a shared test utility (or a common base fixture) to reduce duplication and prevent future divergence (especially around the custom aligned/unaligned buffer lifetime handling).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signature differs for set_tensor_and_infer/set_tensors_and_infer methods:

  • BooleanPrecisionInferRequestRunTests's expect ov::InferRequest
  • ZeroInferRequestTests's expect std::shared_ptr<intel_npu::ZeroInferRequest>

@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from c804779 to a0c4a7e Compare March 11, 2026 08:38
@MirceaDan99 MirceaDan99 force-pushed the intel_npu/fix_boolean_datatype_for_set_tenor branch from d57fecb to 9d2b1e3 Compare March 11, 2026 13:40
@MirceaDan99 MirceaDan99 added this pull request to the merge queue Mar 11, 2026
Merged via the queue into openvinotoolkit:master with commit dfbb82a Mar 11, 2026
216 of 220 checks passed
@MirceaDan99 MirceaDan99 deleted the intel_npu/fix_boolean_datatype_for_set_tenor branch March 11, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants