Skip to content

OpenVINO EP doesn't respect threading parameters #260

Open
@mbahri

Description

@mbahri

Description
In ONNXRuntime, the OpenVINO EP accepts configuration options to set the number of threads and number of streams documented here, but these are ignored when passed to the EP in the Triton model config, for example:

optimization { execution_accelerators {
  cpu_execution_accelerator : [ {
    name : "openvino"
    parameters { key: "num_of_threads" value: "4" }
    parameters { key: "num_streams" value: "4" }
  } ]
}}

The threading configuration for the ONNXRuntime backend is also ignored (expected)

parameters { key: "intra_op_thread_count" value: { string_value: "4" } }
parameters { key: "inter_op_thread_count" value: { string_value: "2" } }

Triton Information
Last tested with the Triton container 24.05.

To Reproduce
Serving an ONNX model we observe:

  • The intra_op_thread_count / inter_op_thread_count affect the number of inference threads used when OpenVINO is disabled
  • Enabling OpenVINO optimizations, CPU usage jumps to the default/max number of CPU threads
  • Attempting to set num_of_threads or num_streams has no effect

Expected behavior
Expected behaviour would be that the OpenVINO EP ignores intra_op_thread_count and inter_op_thread_count but obeys num_of_threads and num_streams.

Unless I missed something and the ORT backend with OpenVINO optimizations reads the OpenVINO backend parameters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions