intel · Jaswanth51 · Jan 8, 2026 · Jan 14, 2026 · Feb 10, 2026
diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md
@@ -30,9 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest rele
 
 |ONNX Runtime|OpenVINO™|Notes|
 |---|---|---| 
+|1.24.0|2025.4.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.9)|
 |1.23.0|2025.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.8)|
 |1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)|
-|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)|
 
 ## Build
 
@@ -79,15 +79,15 @@ Runtime parameters set during OpenVINO Execution Provider initialization to cont
 | [**num_of_threads**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Control number of inference threads |
 | [**num_streams**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Set parallel execution streams for throughput |
 | [**cache_dir**](#cache_dir) | string | Valid filesystem path | string | Enable openvino model caching for improved latency  |
-| [**load_config**](#load_config) | string | JSON file path | string | Load and set custom/HW specific OpenVINO properties from JSON |
+| [**load_config**](#load_config) | string | JSON string | string | Load and set custom/HW specific OpenVINO properties from JSON |
 | [**enable_qdq_optimizer**](#enable_qdq_optimizer) | string | True/False | boolean | Enable QDQ optimization for NPU |
 | [**disable_dynamic_shapes**](#disable_dynamic_shapes) | string | True/False | boolean | Convert dynamic models to static shapes |
 | [**reshape_input**](#reshape_input) | string | input_name[shape_bounds] | string | Specify upper and lower bound for dynamic shaped inputs for improved performance with NPU |
 | [**layout**](#layout) | string | input_name[layout_format] | string | Specify input/output tensor layout format |
 
 **Deprecation Notice**
 
-The following provider options are **deprecated** and should be migrated to `load_config` for better compatibility with future releases.
+The following provider options are **deprecated since ORT 1.23** and should be migrated to `load_config` for better compatibility with future releases.
 
 | Deprecated Provider Option | `load_config` Equivalent | Recommended Migration |
 |---------------------------|------------------------|----------------------|
@@ -147,7 +147,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio
 ---
 
 ### `precision`
-**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property.
+**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property.
 - Controls numerical precision during inference, balancing **performance** and **accuracy**.
 
 **Precision Support on Devices:**
@@ -167,7 +167,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio
 ---
 ### `num_of_threads` & `num_streams`
 
-**DEPRECATED:** These options are deprecated and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively.
+**DEPRECATED:** These options are deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively.
 
 **Multi-Threading**
 
@@ -185,31 +185,33 @@ Manages parallel inference streams for throughput optimization (default: `1` for
 
 ### `cache_dir`
 
-**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `CACHE_DIR` property.
+**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `CACHE_DIR` property.
 
-Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU.
+
+Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU.  
 
 **Benefits**
-- Saves compiled models and `cl_cache` files for dynamic shapes
+- Saves compiled models for faster subsequent loading
 - Eliminates recompilation overhead on subsequent runs
-- Particularly useful for complex models and frequent application restarts
-
+- Particularly useful for optimizing application startup latencies, especially for complex models
 
 ---
 
 ### `load_config`
 
-**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON configuration file during runtime.
+**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON String during runtime.
 
 #### Overview
 
-`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON file. This is the **preferred method** for configuring advanced OpenVINO features, offering:
+`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON String. This is the **preferred method** for configuring advanced OpenVINO features, offering:
 
 - Direct access to OpenVINO runtime properties
 - Device-specific configuration
 - Better compatibility with future OpenVINO releases
 - No property name translation required
 
+
+
 #### JSON Configuration Format
 ```json
 {
@@ -219,6 +221,33 @@ Enables model caching to significantly reduce subsequent load times. Supports CP
 }
 ```
 
+`load_config` now supports nested JSON objects up to **8 levels deep** for complex device configurations.
+
+**Maximum Nesting:** 8 levels deep.
+
+**Example: Multi-Level Nested Configuration**
+```python
+import onnxruntime as ort
+import json
+
+# Complex nested configuration for AUTO device
+config = {
+    "AUTO": {
+        "PERFORMANCE_HINT": "THROUGHPUT",
+        "DEVICE_PROPERTIES": {
+            "CPU": {
+                "PERFORMANCE_HINT": "LATENCY",
+                "NUM_STREAMS": "3"
+            },
+            "GPU": {
+                "EXECUTION_MODE_HINT": "ACCURACY",
+                "PERFORMANCE_HINT": "LATENCY"
+            }
+        }
+    }
+}
+```
+
 **Supported Device Names:**
 - `"CPU"` - Intel CPU
 - `"GPU"` - Intel integrated/discrete GPU
@@ -327,7 +356,7 @@ Property keys used in `load_config` JSON must match the string literal defined i
 
 ### `enable_qdq_optimizer`
 
-**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property.
+**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property.
 
 NPU-specific optimization for Quantize-Dequantize (QDQ) operations in the inference graph. This optimizer enhances ORT quantized models by:
 
@@ -362,7 +391,7 @@ This configuration is required for optimal NPU memory allocation and management.
 
 ### `model_priority`
 
-**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `MODEL_PRIORITY` property.
+**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `MODEL_PRIORITY` property.
 
 Configures resource allocation priority for multi-model deployment scenarios.
 
@@ -401,31 +430,25 @@ Configures resource allocation priority for multi-model deployment scenarios.
 
 `input_image[NCHW],output_tensor[NC]`
 
+
 ---
 
 ## Examples
-
 ### Python
-
-#### Using load_config with JSON file
+#### Using load_config with JSON string
 ```python
 import onnxruntime as ort
 import json
 
-# Create config file
+# Create config
 config = {
     "AUTO": {
         "PERFORMANCE_HINT": "THROUGHPUT",
-        "PERF_COUNT": "NO",
-        "DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}"
+        "DEVICE_PROPERTIES": "{GPU:{EXECUTION_MODE_HINT:ACCURACY,PERFORMANCE_HINT:LATENCY}}"
     }
 }
-
-with open("ov_config.json", "w") as f:
-    json.dump(config, f)
-
 # Use config with session
-options = {"device_type": "AUTO", "load_config": "ov_config.json"}
+options = {"device_type": "AUTO", "load_config": json.dumps(config)}
 session = ort.InferenceSession("model.onnx", 
                                 providers=[("OpenVINOExecutionProvider", options)])
 ```
@@ -438,20 +461,14 @@ import json
 # Create CPU config
 config = {
     "CPU": {
-        "INFERENCE_PRECISION_HINT": "f32",
-        "NUM_STREAMS": "3",
-        "INFERENCE_NUM_THREADS": "8"
+        "PERFORMANCE_HINT": "LATENCY",
+        "NUM_STREAMS": "1"
     }
 }
-
-with open("cpu_config.json", "w") as f:
-    json.dump(config, f)
-
-options = {"device_type": "CPU", "load_config": "cpu_config.json"}
+options = {"device_type": "CPU", "load_config": json.dumps(config)}
 session = ort.InferenceSession("model.onnx", 
                                 providers=[("OpenVINOExecutionProvider", options)])
 ```
-
 #### Using load_config for GPU
 ```python
 import onnxruntime as ort
@@ -460,20 +477,16 @@ import json
 # Create GPU config with caching
 config = {
     "GPU": {
-        "INFERENCE_PRECISION_HINT": "f16",
+        "EXECUTION_MODE_HINT": "ACCURACY",
         "CACHE_DIR": "./model_cache",
         "PERFORMANCE_HINT": "LATENCY"
     }
 }
-
-with open("gpu_config.json", "w") as f:
-    json.dump(config, f)
-
-options = {"device_type": "GPU", "load_config": "gpu_config.json"}
+options = {"device_type": "GPU", "load_config": json.dumps(config)}
 session = ort.InferenceSession("model.onnx", 
                                 providers=[("OpenVINOExecutionProvider", options)])
-```
 
+```
 
 --- 
 ### Python API