[Performance] 40% slowdown in ONNX Resize Operator on CPU #23391
Open
Description
Describe the issue
We observed a significant performance regression (~40% slowdown) in the Resize
operator when using Float32
and Int64
data types on the CPU.
This slowdown impacts workloads that rely heavily on the Resize
operator, particularly in image processing tasks.
After the bisect, we found the commit 6cc06ad introduces the slowdown.
model:
analysis:
[name: model_loading_uri Op: Unknown]: 622 / 662 : 93.95770392749245%
[name: session_initialization Op: Unknown]: 3737 / 4004 : 93.33166833166833%
[name: /Resize_fence_before Op: Resize]: 48 / 1 : 4800.0%
[name: /Resize_kernel_time Op: Resize]: 84187 / 84921 : 99.13566726722483%
[name: /Resize_fence_after Op: Resize]: 0 / 12 : 0.0%
[name: /Resize_1_fence_before Op: Resize]: 4 / 22 : 18.181818181818183%
[name: /Resize_1_kernel_time Op: Resize]: 634602 / 444980 : 142.61360061126342%
[name: /Resize_1_fence_after Op: Resize]: 0 / 0 : 0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 4 : 0.0%
[name: /Ceil_kernel_time Op: Ceil]: 259949 / 262544 : 99.01159424705955%
[name: /Ceil_fence_after Op: Ceil]: 10 / 0 : 0%
[name: SequentialExecutor::Execute Op: Unknown]: 1118971 / 924960 : 120.97506919218128%
[name: model_run Op: Unknown]: 1141098 / 950941 : 119.99671903935155%
To reproduce
- Download and unzip "model.zip".
- Run the following script.
import time
import onnxruntime
import numpy as np
# Set the random seed
np.random.seed(0)
onnx_model_path = 'model.onnx'
# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()
nth = 100000
# Warm-up inference to cache optimizations
input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)
# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):
start_ns = time.perf_counter_ns()
ort_session.run(None, input_data)
end_ns = time.perf_counter_ns()
total_time_ns += end_ns - start_ns
avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6
print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')
Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
Is this a quantized model?
Yes