Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 54 additions & 41 deletions docs/execution-providers/OpenVINO-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest rele

|ONNX Runtime|OpenVINO™|Notes|
|---|---|---|
|1.24.0|2025.4.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.9)|
|1.23.0|2025.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.8)|
|1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)|
|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)|

## Build

Expand Down Expand Up @@ -79,15 +79,15 @@ Runtime parameters set during OpenVINO Execution Provider initialization to cont
| [**num_of_threads**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Control number of inference threads |
| [**num_streams**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Set parallel execution streams for throughput |
| [**cache_dir**](#cache_dir) | string | Valid filesystem path | string | Enable openvino model caching for improved latency |
| [**load_config**](#load_config) | string | JSON file path | string | Load and set custom/HW specific OpenVINO properties from JSON |
| [**load_config**](#load_config) | string | JSON string | string | Load and set custom/HW specific OpenVINO properties from JSON |
| [**enable_qdq_optimizer**](#enable_qdq_optimizer) | string | True/False | boolean | Enable QDQ optimization for NPU |
| [**disable_dynamic_shapes**](#disable_dynamic_shapes) | string | True/False | boolean | Convert dynamic models to static shapes |
| [**reshape_input**](#reshape_input) | string | input_name[shape_bounds] | string | Specify upper and lower bound for dynamic shaped inputs for improved performance with NPU |
| [**layout**](#layout) | string | input_name[layout_format] | string | Specify input/output tensor layout format |

**Deprecation Notice**

The following provider options are **deprecated** and should be migrated to `load_config` for better compatibility with future releases.
The following provider options are **deprecated since ORT 1.23** and should be migrated to `load_config` for better compatibility with future releases.

| Deprecated Provider Option | `load_config` Equivalent | Recommended Migration |
|---------------------------|------------------------|----------------------|
Expand Down Expand Up @@ -147,7 +147,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio
---

### `precision`
**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property.
**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property.
- Controls numerical precision during inference, balancing **performance** and **accuracy**.

**Precision Support on Devices:**
Expand All @@ -167,7 +167,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio
---
### `num_of_threads` & `num_streams`

**DEPRECATED:** These options are deprecated and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively.
**DEPRECATED:** These options are deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively.

**Multi-Threading**

Expand All @@ -185,31 +185,33 @@ Manages parallel inference streams for throughput optimization (default: `1` for

### `cache_dir`

**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `CACHE_DIR` property.
**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `CACHE_DIR` property.

Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU.

Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU.

**Benefits**
- Saves compiled models and `cl_cache` files for dynamic shapes
- Saves compiled models for faster subsequent loading
- Eliminates recompilation overhead on subsequent runs
- Particularly useful for complex models and frequent application restarts

- Particularly useful for optimizing application startup latencies, especially for complex models

---

### `load_config`

**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON configuration file during runtime.
**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON String during runtime.

#### Overview

`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON file. This is the **preferred method** for configuring advanced OpenVINO features, offering:
`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON String. This is the **preferred method** for configuring advanced OpenVINO features, offering:

- Direct access to OpenVINO runtime properties
- Device-specific configuration
- Better compatibility with future OpenVINO releases
- No property name translation required



#### JSON Configuration Format
```json
{
Expand All @@ -219,6 +221,33 @@ Enables model caching to significantly reduce subsequent load times. Supports CP
}
```

`load_config` now supports nested JSON objects up to **8 levels deep** for complex device configurations.

**Maximum Nesting:** 8 levels deep.

**Example: Multi-Level Nested Configuration**
```python
import onnxruntime as ort
import json

# Complex nested configuration for AUTO device
config = {
"AUTO": {
"PERFORMANCE_HINT": "THROUGHPUT",
"DEVICE_PROPERTIES": {
"CPU": {
"PERFORMANCE_HINT": "LATENCY",
"NUM_STREAMS": "3"
},
"GPU": {
"EXECUTION_MODE_HINT": "ACCURACY",
"PERFORMANCE_HINT": "LATENCY"
}
}
}
}
```

**Supported Device Names:**
- `"CPU"` - Intel CPU
- `"GPU"` - Intel integrated/discrete GPU
Expand Down Expand Up @@ -327,7 +356,7 @@ Property keys used in `load_config` JSON must match the string literal defined i

### `enable_qdq_optimizer`

**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property.
**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property.

NPU-specific optimization for Quantize-Dequantize (QDQ) operations in the inference graph. This optimizer enhances ORT quantized models by:

Expand Down Expand Up @@ -362,7 +391,7 @@ This configuration is required for optimal NPU memory allocation and management.

### `model_priority`

**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `MODEL_PRIORITY` property.
**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `MODEL_PRIORITY` property.

Configures resource allocation priority for multi-model deployment scenarios.

Expand Down Expand Up @@ -401,31 +430,25 @@ Configures resource allocation priority for multi-model deployment scenarios.

`input_image[NCHW],output_tensor[NC]`


---

## Examples

### Python

#### Using load_config with JSON file
#### Using load_config with JSON string
```python
import onnxruntime as ort
import json

# Create config file
# Create config
config = {
"AUTO": {
"PERFORMANCE_HINT": "THROUGHPUT",
"PERF_COUNT": "NO",
"DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}"
"DEVICE_PROPERTIES": "{GPU:{EXECUTION_MODE_HINT:ACCURACY,PERFORMANCE_HINT:LATENCY}}"
}
}

with open("ov_config.json", "w") as f:
json.dump(config, f)

# Use config with session
options = {"device_type": "AUTO", "load_config": "ov_config.json"}
options = {"device_type": "AUTO", "load_config": json.dumps(config)}
session = ort.InferenceSession("model.onnx",
providers=[("OpenVINOExecutionProvider", options)])
```
Expand All @@ -438,20 +461,14 @@ import json
# Create CPU config
config = {
"CPU": {
"INFERENCE_PRECISION_HINT": "f32",
"NUM_STREAMS": "3",
"INFERENCE_NUM_THREADS": "8"
"PERFORMANCE_HINT": "LATENCY",
"NUM_STREAMS": "1"
}
}

with open("cpu_config.json", "w") as f:
json.dump(config, f)

options = {"device_type": "CPU", "load_config": "cpu_config.json"}
options = {"device_type": "CPU", "load_config": json.dumps(config)}
session = ort.InferenceSession("model.onnx",
providers=[("OpenVINOExecutionProvider", options)])
```

#### Using load_config for GPU
```python
import onnxruntime as ort
Expand All @@ -460,20 +477,16 @@ import json
# Create GPU config with caching
config = {
"GPU": {
"INFERENCE_PRECISION_HINT": "f16",
"EXECUTION_MODE_HINT": "ACCURACY",
"CACHE_DIR": "./model_cache",
"PERFORMANCE_HINT": "LATENCY"
}
}

with open("gpu_config.json", "w") as f:
json.dump(config, f)

options = {"device_type": "GPU", "load_config": "gpu_config.json"}
options = {"device_type": "GPU", "load_config": json.dumps(config)}
session = ort.InferenceSession("model.onnx",
providers=[("OpenVINOExecutionProvider", options)])
```

```

---
### Python API
Expand Down
Loading