Triton ONNX runtime backend slower than onnxruntime python client on CPU

**Description**
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model using the ONNXRuntime Python client directly. This performance discrepancy is observed under identical conditions, including the same hardware, model, and input data.

**Triton Information**
TRITON_VERSION <= 24.09

**To Reproduce**

model used:
```bash
wget -O model.onnx https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/densenet-121/model/densenet-12.onnx
```

### Triton server (ONNX runtime)

config.pbtxt
```txt
name: "test_densenet" 
platform: "onnxruntime_onnx"
```

### Python clients

#### Triton client

```py
import numpy as np
import tritonclient.grpc as grpcclient
import tritonclient.grpc._infer_input as infer_input

grpcclient = grpcclient.InferenceServerClient(url='localhost:9178')

i = infer_input.InferInput('data_0', [1, 3, 224, 224], 'FP32')
i.set_data_from_numpy(np.zeros((1, 3, 224, 224), dtype=np.float32))
```

```py
%%timeit
res = grpcclient.infer(model_name="test_densenet", inputs=[i])
```
results: `473 ms ± 87.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)`

#### ONNX Runtime

```py
import onnxruntime as ort

ort_sess = ort.InferenceSession('model.onnx')
test_inputs = {"data_0": np.zeros((1, 3, 224, 224), dtype=np.float32)}
```

```py
%%timeit
ort_sess.run(["fc6_1"], test_inputs)
```
results: `159 ms ± 23.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triton ONNX runtime backend slower than onnxruntime python client on CPU #265

Triton server (ONNX runtime)

Python clients

Triton client

ONNX Runtime

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Triton ONNX runtime backend slower than onnxruntime python client on CPU #265

Description

Triton server (ONNX runtime)

Python clients

Triton client

ONNX Runtime

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions