OpenVINO EP doesn't respect threading parameters

**Description**
In ONNXRuntime, the OpenVINO EP accepts configuration options to set the number of threads and number of streams documented [here](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#cc-api-20), but these are ignored when passed to the EP in the Triton model config, for example:
```
optimization { execution_accelerators {
  cpu_execution_accelerator : [ {
    name : "openvino"
    parameters { key: "num_of_threads" value: "4" }
    parameters { key: "num_streams" value: "4" }
  } ]
}}
```
The threading configuration for the ONNXRuntime backend is also ignored (expected)
```
parameters { key: "intra_op_thread_count" value: { string_value: "4" } }
parameters { key: "inter_op_thread_count" value: { string_value: "2" } }
```

**Triton Information**
Last tested with the Triton container 24.05.

**To Reproduce**
Serving an ONNX model we observe:
* The `intra_op_thread_count` / `inter_op_thread_count` affect the number of inference threads used when OpenVINO is disabled
* Enabling OpenVINO optimizations, CPU usage jumps to the default/max number of CPU threads
* Attempting to set `num_of_threads` or `num_streams` has no effect

**Expected behavior**
Expected behaviour would be that the OpenVINO EP ignores `intra_op_thread_count` and `inter_op_thread_count` but obeys `num_of_threads` and `num_streams`.

Unless I missed something and the ORT backend with OpenVINO optimizations reads the [OpenVINO backend parameters](https://github.com/triton-inference-server/openvino_backend#parameters)?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenVINO EP doesn't respect threading parameters #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenVINO EP doesn't respect threading parameters #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions