Skip to content

IO Binding with OpenVINO EP: device_name() always returns 'cpu' #26989

@Stanley5249

Description

@Stanley5249

Environment

  • ONNX Runtime: 1.23.0
  • Execution Provider: OpenVINOExecutionProvider
  • Platform: Windows
  • Hardware: Intel Arc integrated GPU (unified memory)
  • OpenVINO devices: CPU, GPU, NPU

Issue Summary

When using IO binding with OpenVINO Execution Provider, OrtValue.device_name() always returns 'cpu' regardless of device specification in Python API. In Rust/C API, using device-specific allocators fails with "No requested allocator available".

Observed Behavior

Python API

import onnxruntime as ort

# Create session with GPU device
providers = [("OpenVINOExecutionProvider", {"device_type": "GPU"})]
session = ort.InferenceSession("model.onnx", providers=providers)

io_binding = session.io_binding()
io_binding.bind_cpu_input("input", input_array)
io_binding.bind_output("output", "gpu")  # Request GPU output

session.run_with_iobinding(io_binding)
outputs = io_binding.get_outputs()

print(outputs[0].device_name())  # Returns: 'cpu' (expected: 'gpu')

Result: device_name() returns 'cpu' instead of 'gpu'. IO binding appears to be ignored - it just works.

Rust/C API

Related: pykeio/ort#513

When using Rust bindings (unofficial, provided as additional context), IO binding still uses the old CreateMemoryInfo API instead of CreateMemoryInfo_V2. This uses device strings defined in allocator.h:

Errors observed:

  • "OpenVINO_GPU", "OpenVINO_RT_NPU""No requested allocator available"
  • "OpenVINO_CPU", "OpenVINO_RT""Specified device is not supported"
  • "DML" (DirectML build) → "No requested allocator available"

namespace onnxruntime {
constexpr const char* CPU = "Cpu";
constexpr const char* CPU_ALIGNED_4K = "CpuAligned4K";
constexpr const char* CUDA = "Cuda";
constexpr const char* CUDA_PINNED = "CudaPinned";
constexpr const char* CANN = "Cann";
constexpr const char* CANN_PINNED = "CannPinned";
constexpr const char* DML = "DML";
constexpr const char* HIP = "Hip";
constexpr const char* HIP_PINNED = "HipPinned";
constexpr const char* OpenVINO_CPU = "OpenVINO_CPU";
constexpr const char* OpenVINO_GPU = "OpenVINO_GPU";
constexpr const char* OpenVINO_RT = "OpenVINO_RT";
constexpr const char* OpenVINO_RT_NPU = "OpenVINO_RT_NPU";
constexpr const char* QNN_HTP_SHARED = "QnnHtpShared";
constexpr const char* WEBGPU_BUFFER = "WebGPU_Buffer";
constexpr const char* WEBNN_TENSOR = "WebNN_Tensor";

ORT_API_STATUS_IMPL(OrtApis::CreateMemoryInfo, _In_ const char* name1, enum OrtAllocatorType type, int id1,
enum OrtMemType mem_type1, _Outptr_ OrtMemoryInfo** out) {
API_IMPL_BEGIN
if (name1 == nullptr) {
return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "MemoryInfo name cannot be null.");
}
if (out == nullptr) {
return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "Output memory info cannot be null.");
}
auto device_id = static_cast<OrtDevice::DeviceId>(id1);
if (strcmp(name1, onnxruntime::CPU) == 0) {
*out = new OrtMemoryInfo(onnxruntime::CPU, type, OrtDevice(), mem_type1);
} else if (strcmp(name1, onnxruntime::CUDA) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::CUDA, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NVIDIA, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::OpenVINO_GPU) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::OpenVINO_GPU, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::INTEL, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::HIP) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::HIP, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::AMD, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::WEBGPU_BUFFER) == 0 ||
strcmp(name1, onnxruntime::WEBNN_TENSOR) == 0) {
*out = new OrtMemoryInfo(
name1, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NONE, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::DML) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::DML, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::MICROSOFT, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::OpenVINO_RT_NPU) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::OpenVINO_RT_NPU, type,
OrtDevice(OrtDevice::NPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::INTEL, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::CUDA_PINNED) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::CUDA_PINNED, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::NVIDIA, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::HIP_PINNED) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::HIP_PINNED, type,
OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::AMD, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::QNN_HTP_SHARED) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::QNN_HTP_SHARED, type,
OrtDevice(OrtDevice::CPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::QUALCOMM, device_id),
mem_type1);
} else if (strcmp(name1, onnxruntime::CPU_ALIGNED_4K) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::CPU_ALIGNED_4K, type,
OrtDevice(OrtDevice::CPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::NONE, device_id,
onnxruntime::kAlloc4KAlignment),
mem_type1);
} else {
return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "Specified device is not supported. Try CreateMemoryInfo_V2.");
}
API_IMPL_END
return nullptr;
}

Questions

  1. Is this related to integrated GPU unified memory architecture?
    For integrated GPUs where CPU and GPU share physical memory, are outputs in shared memory but reported as 'cpu'? Would this differ on discrete GPUs?

  2. Can documentation or error messages be improved?
    If this is expected behavior for unified memory, could the documentation clarify this? The "No requested allocator available" error message could explain this is expected for integrated GPUs.

  3. What is the recommended zero-copy pattern for OpenVINO EP?
    Does the pattern (bind_cpu_input() + bind_output()) already achieve zero-copy on unified memory, or should we just use session.run()?

Summary

  • Python: IO binding with device strings works but appears ignored (device_name() always returns 'cpu')
  • Rust: IO binding with device allocators fails ("No requested allocator available")
  • Question: Is this expected for unified memory? If so, documentation should clarify this.

Urgency

No response

Platform

Windows

OS Version

Windows 11 Home 25H2 (Build 26200.7462)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

2025.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:DMLissues related to the DirectML execution providerep:OpenVINOissues related to OpenVINO execution providerstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions