[NPU][ZeroInferRequest][set_tensor] Provide precision hint for allocation of ZeroTensor#34264
Conversation
597b01e to
366e85f
Compare
src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds an optional precision hint when allocating ZeroTensor in the NPU Level Zero backend, allowing allocations to match the user-provided tensor element type (useful for cases like ov::element::boolean with older compiler/driver behaviors).
Changes:
- Extend
ZeroInferRequest::allocate_tensorwith an optionalov::element::Typeprecision parameter. - Pass the user tensor element type into
allocate_tensorinset_tensor/set_tensorsfallback allocation paths. - Use the hinted precision for
check_network_precision()andZeroTensorconstruction.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp |
Threads user tensor element type through fallback allocation to guide ZeroTensor precision. |
src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp |
Updates allocate_tensor signature and documents the new optional precision parameter. |
src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/backend/include/zero_infer_request.hpp
Outdated
Show resolved
Hide resolved
e09d662 to
653e172
Compare
src/plugins/intel_npu/tests/functional/behavior/ov_infer_request/infer_request_run.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/tests/functional/behavior/ov_infer_request/infer_request_run.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/tests/functional/behavior/ov_infer_request/infer_request_run.hpp
Outdated
Show resolved
Hide resolved
pereanub
left a comment
There was a problem hiding this comment.
What is happening if receiving an internal tensor with U8 first and BOOL after that for the same infer?
@pereanub,
|
|
Oh, very hard to follow that test, please create different func test for each case. |
@pereanub Refactored the test in 905ae73 to be more modular by defining lambda helpers:
and the actual tested lines to be: |
2b7c51b to
14bbfc6
Compare
…g not existing batched tensor for PV driver
… requests for PV drivers
…f compiler support for `Boolean LessEq Op`
…r(set_tensor/s)` methods
a3560d0 to
f40ebad
Compare
c371687 to
c804779
Compare
| const bool isMutableCommandListSupported = _initStructs->getMutableCommandListExtVersion() >= ZE_MAKE_VERSION(1, 0); | ||
| if (isMutableCommandListSupported && batchSizeCandidate.has_value()) { | ||
| get_level_zero_inputs(foundPort.idx).resize(tensors.size()); | ||
|
|
||
| for (size_t i = 0; i < tensors.size(); i++) { | ||
| try { | ||
| _logger.debug("ZeroInferRequest::set_tensors - create zero tensor"); | ||
| OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "create zero tensor"); | ||
| get_level_zero_input(foundPort.idx, i) = std::make_shared<ZeroTensor>(_initStructs, tensors.at(i)); | ||
| } catch (const ZeroMemException& exception) { | ||
| _logger.debug( | ||
| "ZeroInferRequest::set_tensors - exception caught while trying to create a Level Zero tensor " | ||
| "from the user tensor: %s", | ||
| exception.what()); | ||
|
|
||
| _logger.debug("ZeroInferRequest::set_tensors - allocate locally L0 tensor"); | ||
| OV_ITT_TASK_NEXT(ZERO_SET_TENSORS, "allocate tensor"); | ||
| get_level_zero_input(foundPort.idx, i) = allocate_tensor(foundPort.idx, INPUT, batchSizeCandidate); | ||
| } |
There was a problem hiding this comment.
The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.
There was a problem hiding this comment.
There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below.
| } else { | ||
| auto& levelZeroTensor = get_level_zero_input(foundPort.idx); |
There was a problem hiding this comment.
The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.
There was a problem hiding this comment.
There is no tensors.at(SINGLE_TENSOR) by these lines, check reply below
| const auto& userTensorElementType = tensors.at(SINGLE_TENSOR)->get_element_type(); | ||
| if (userTensorElementType == ov::element::boolean && levelZeroTensor->get_element_type() == ov::element::u8) { | ||
| levelZeroTensor->set_element_type(userTensorElementType); | ||
| } | ||
| } |
There was a problem hiding this comment.
The batched set_tensors path is currently gated by isMutableCommandListSupported. When mutable command lists are not supported but batchSizeCandidate.has_value() (i.e., the caller passes multiple tensors), the code falls into the single-tensor branch and only allocates/uses a single levelZeroTensor (and only checks tensors.at(SINGLE_TENSOR)), effectively ignoring the rest of the provided tensors. This can lead to incorrect behavior (wrong backing allocations/copies) for PV-driver scenarios where set_tensors is still expected to work. Consider restructuring so that batchSizeCandidate.has_value() drives vector allocation/import/allocation of per-batch ZeroTensors regardless of mutable support; then only guard the _pipeline->update_graph_arguments(...) calls behind isMutableCommandListSupported.
There was a problem hiding this comment.
Precision for the rest of the tensors is already asserted in src/plugins/intel_npu/src/common/src/sync_infer_request.cpp#L310 (check @SyncInferRequest::check_batched_tensors method)
| namespace { | ||
| constexpr uint32_t TARGET_ZE_DRIVER_NPU_EXT_VERSION = ZE_DRIVER_NPU_EXT_VERSION_1_0; | ||
| constexpr uint32_t TARGET_ZE_GRAPH_NPU_EXT_VERSION = ZE_GRAPH_EXT_VERSION_1_16; | ||
| constexpr uint32_t TARGET_ZE_COMMAND_QUEUE_NPU_EXT_VERSION = ZE_COMMAND_QUEUE_NPU_EXT_VERSION_1_1; | ||
| constexpr uint32_t TARGET_ZE_PROFILING_NPU_EXT_VERSION = ZE_PROFILING_DATA_EXT_VERSION_1_0; | ||
| constexpr uint32_t TARGET_ZE_CONTEXT_NPU_EXT_VERSION = ZE_CONTEXT_NPU_EXT_VERSION_1_0; | ||
| constexpr uint32_t TARGET_ZE_MUTABLE_COMMAND_LIST_EXT_VERSION = ZE_MUTABLE_COMMAND_LIST_EXP_VERSION_1_1; | ||
| constexpr uint32_t TARGET_ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION = ZE_EXTERNAL_MEMMAP_SYSMEM_EXT_VERSION_1_0; | ||
| } // namespace |
There was a problem hiding this comment.
Putting an anonymous namespace in a header creates a separate set of internal-linkage constants per translation unit, which is generally discouraged and can make reuse/visibility more confusing. Prefer inline constexpr constants in a named namespace (e.g., intel_npu::test or similar), or make them static constexpr members of ZeroInitStructsMock (or a dedicated traits struct) to keep the defaults well-scoped and consistently referenced.
| @@ -55,6 +55,8 @@ class ZeroTensor final : public ov::ITensor { | |||
|
|
|||
| const ov::element::Type& get_element_type() const override; | |||
|
|
|||
There was a problem hiding this comment.
set_element_type reads like a general-purpose mutator, but the implementation asserts it is only valid for the boolean/u8 special-case. To avoid accidental misuse, consider either (a) renaming it to reflect the narrow intent (e.g., boolean/u8 override), and/or (b) documenting the strict preconditions in the header comment, and/or (c) limiting exposure (e.g., making it private and granting ZeroInferRequest access).
| /** | |
| * @brief Special-purpose override for the tensor element type. | |
| * | |
| * This API is intended only for the narrow boolean/u8 handling case used by | |
| * the Zero inference pipeline. It must not be used as a general-purpose | |
| * element type mutator. The implementation asserts that only the supported | |
| * conversion(s) are requested and may fail if used with other element types. | |
| */ |
| auto allocate_tensors() -> std::tuple</* importMemoryBatched */ ov::Tensor, | ||
| /* importMemoryTensor_1 */ ov::Tensor, | ||
| /* importMemoryTensor_2 */ ov::Tensor, | ||
| /* unalignedBatchedTensor */ ov::Tensor, | ||
| /* unalignedTensor_1 */ ov::Tensor, | ||
| /* unalignedTensor_2 */ ov::Tensor> { | ||
| auto model_shape = ov_model->get_parameters()[0]->get_shape(); | ||
| ov::Coordinate start_coordinate{model_shape}; | ||
| ov::Coordinate stop_coordinate{model_shape}; | ||
| start_coordinate[0] = 1; | ||
| stop_coordinate[0] = 2; | ||
| ov::Allocator alignedAllocator{::intel_npu::utils::AlignedAllocator{::intel_npu::utils::STANDARD_PAGE_SIZE}}; | ||
| ov::Tensor importMemoryBatchedTensor(ov::element::boolean, model_shape, alignedAllocator); | ||
| ov::Tensor importMemoryTensor_1(importMemoryBatchedTensor, ov::Coordinate{0, 0, 0, 0}, start_coordinate); | ||
| ov::Tensor importMemoryTensor_2(importMemoryBatchedTensor, ov::Coordinate{1, 0, 0, 0}, stop_coordinate); | ||
| void* alignedAddr = ::operator new(ov::element::boolean.size() * ov::shape_size(model_shape) + 1, | ||
| std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE)); | ||
| void* unalignedAddr = static_cast<uint8_t*>(alignedAddr) + 1; | ||
| std::shared_ptr<void> deallocateAddressCallback(alignedAddr, [](void* ptr) { | ||
| ::operator delete(ptr, std::align_val_t(::intel_npu::utils::STANDARD_PAGE_SIZE)); | ||
| }); |
There was a problem hiding this comment.
The tensor-allocation and set_tensor/set_tensors helper logic appears duplicated between this new internal test suite and BooleanPrecisionInferRequestRunTests in infer_request_run.hpp. Consider extracting these helpers into a shared test utility (or a common base fixture) to reduce duplication and prevent future divergence (especially around the custom aligned/unaligned buffer lifetime handling).
There was a problem hiding this comment.
Signature differs for set_tensor_and_infer/set_tensors_and_infer methods:
BooleanPrecisionInferRequestRunTests's expectov::InferRequestZeroInferRequestTests's expectstd::shared_ptr<intel_npu::ZeroInferRequest>
c804779 to
a0c4a7e
Compare
d57fecb to
9d2b1e3
Compare
dfbb82a

Details:
Add optionalov::element::Typeparameter forZeroInferRequest::allocate_tensormethodov::Tensor::set_element_typemethod to cover boolean-u8 precision mismatches between user tensors and precisions from compiler descriptorsZeroTensorscan now be allocated duringset_tensor/set_tensorsmethods even with PV driver (whenzeMutableCommandListExtVersionis less than 1.0) having their precision updated accordingly if needed (special case forov::element::boolean->ov::element::u8)BooleanPrecisionInferRequestRunTestsmeant to be compatible with PV driver, but skipped due to ELF loader from PV driver not being able to parse boolean inputs from blobsZeroInferRequestTeststhat createZeroInferRequestlocally using differentZeroInitStructs(reinterpret casted fromZeroInitMock)ZeroInitMockobject to accept all of the extension parameters respecting this order:zeDriverNpuExtVersionzeGraphNpuExtVersion- in the past,ZeroInitMockpermitted overwritting of only this param!zeCommandQueueNpuExtVersionzeProfilingNpuExtVersionzeContextNpuExtVersionzeMutableCommandListExtVersionzeExternalMemMapSysMemExtVersionTickets:
AI Assistance:
yes