-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Environment
- ONNX Runtime: 1.23.0
- Execution Provider: OpenVINOExecutionProvider
- Platform: Windows
- Hardware: Intel Arc integrated GPU (unified memory)
- OpenVINO devices: CPU, GPU, NPU
Issue Summary
When using IO binding with OpenVINO Execution Provider, OrtValue.device_name() always returns 'cpu' regardless of device specification in Python API. In Rust/C API, using device-specific allocators fails with "No requested allocator available".
Observed Behavior
Python API
import onnxruntime as ort
# Create session with GPU device
providers = [("OpenVINOExecutionProvider", {"device_type": "GPU"})]
session = ort.InferenceSession("model.onnx", providers=providers)
io_binding = session.io_binding()
io_binding.bind_cpu_input("input", input_array)
io_binding.bind_output("output", "gpu") # Request GPU output
session.run_with_iobinding(io_binding)
outputs = io_binding.get_outputs()
print(outputs[0].device_name()) # Returns: 'cpu' (expected: 'gpu')Result: device_name() returns 'cpu' instead of 'gpu'. IO binding appears to be ignored - it just works.
Rust/C API
Related: pykeio/ort#513
When using Rust bindings (unofficial, provided as additional context), IO binding still uses the old CreateMemoryInfo API instead of CreateMemoryInfo_V2. This uses device strings defined in allocator.h:
Errors observed:
"OpenVINO_GPU","OpenVINO_RT_NPU"→"No requested allocator available""OpenVINO_CPU","OpenVINO_RT"→"Specified device is not supported""DML"(DirectML build) →"No requested allocator available"
onnxruntime/include/onnxruntime/core/framework/allocator.h
Lines 73 to 89 in e5e9174
| namespace onnxruntime { | |
| constexpr const char* CPU = "Cpu"; | |
| constexpr const char* CPU_ALIGNED_4K = "CpuAligned4K"; | |
| constexpr const char* CUDA = "Cuda"; | |
| constexpr const char* CUDA_PINNED = "CudaPinned"; | |
| constexpr const char* CANN = "Cann"; | |
| constexpr const char* CANN_PINNED = "CannPinned"; | |
| constexpr const char* DML = "DML"; | |
| constexpr const char* HIP = "Hip"; | |
| constexpr const char* HIP_PINNED = "HipPinned"; | |
| constexpr const char* OpenVINO_CPU = "OpenVINO_CPU"; | |
| constexpr const char* OpenVINO_GPU = "OpenVINO_GPU"; | |
| constexpr const char* OpenVINO_RT = "OpenVINO_RT"; | |
| constexpr const char* OpenVINO_RT_NPU = "OpenVINO_RT_NPU"; | |
| constexpr const char* QNN_HTP_SHARED = "QnnHtpShared"; | |
| constexpr const char* WEBGPU_BUFFER = "WebGPU_Buffer"; | |
| constexpr const char* WEBNN_TENSOR = "WebNN_Tensor"; |
onnxruntime/onnxruntime/core/framework/allocator.cc
Lines 209 to 282 in e5e9174
| ORT_API_STATUS_IMPL(OrtApis::CreateMemoryInfo, _In_ const char* name1, enum OrtAllocatorType type, int id1, | |
| enum OrtMemType mem_type1, _Outptr_ OrtMemoryInfo** out) { | |
| API_IMPL_BEGIN | |
| if (name1 == nullptr) { | |
| return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "MemoryInfo name cannot be null."); | |
| } | |
| if (out == nullptr) { | |
| return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "Output memory info cannot be null."); | |
| } | |
| auto device_id = static_cast<OrtDevice::DeviceId>(id1); | |
| if (strcmp(name1, onnxruntime::CPU) == 0) { | |
| *out = new OrtMemoryInfo(onnxruntime::CPU, type, OrtDevice(), mem_type1); | |
| } else if (strcmp(name1, onnxruntime::CUDA) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::CUDA, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NVIDIA, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::OpenVINO_GPU) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::OpenVINO_GPU, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::INTEL, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::HIP) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::HIP, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::AMD, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::WEBGPU_BUFFER) == 0 || | |
| strcmp(name1, onnxruntime::WEBNN_TENSOR) == 0) { | |
| *out = new OrtMemoryInfo( | |
| name1, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NONE, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::DML) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::DML, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::MICROSOFT, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::OpenVINO_RT_NPU) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::OpenVINO_RT_NPU, type, | |
| OrtDevice(OrtDevice::NPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::INTEL, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::CUDA_PINNED) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::CUDA_PINNED, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::HIP_PINNED) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::HIP_PINNED, type, | |
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::AMD, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::QNN_HTP_SHARED) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::QNN_HTP_SHARED, type, | |
| OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::QUALCOMM, device_id), | |
| mem_type1); | |
| } else if (strcmp(name1, onnxruntime::CPU_ALIGNED_4K) == 0) { | |
| *out = new OrtMemoryInfo( | |
| onnxruntime::CPU_ALIGNED_4K, type, | |
| OrtDevice(OrtDevice::CPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NONE, device_id, | |
| onnxruntime::kAlloc4KAlignment), | |
| mem_type1); | |
| } else { | |
| return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "Specified device is not supported. Try CreateMemoryInfo_V2."); | |
| } | |
| API_IMPL_END | |
| return nullptr; | |
| } |
Questions
-
Is this related to integrated GPU unified memory architecture?
For integrated GPUs where CPU and GPU share physical memory, are outputs in shared memory but reported as'cpu'? Would this differ on discrete GPUs? -
Can documentation or error messages be improved?
If this is expected behavior for unified memory, could the documentation clarify this? The "No requested allocator available" error message could explain this is expected for integrated GPUs. -
What is the recommended zero-copy pattern for OpenVINO EP?
Does the pattern (bind_cpu_input()+bind_output()) already achieve zero-copy on unified memory, or should we just usesession.run()?
Summary
- Python: IO binding with device strings works but appears ignored (
device_name()always returns'cpu') - Rust: IO binding with device allocators fails ("No requested allocator available")
- Question: Is this expected for unified memory? If so, documentation should clarify this.
Urgency
No response
Platform
Windows
OS Version
Windows 11 Home 25H2 (Build 26200.7462)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
2025.4.1