[Performance] 40% slowdown in ONNX Resize Operator on CPU

### Describe the issue

We observed a significant performance regression (~40% slowdown) in the `Resize` operator when using `Float32` and `Int64` data types on the CPU. 
This slowdown impacts workloads that rely heavily on the `Resize` operator, particularly in image processing tasks.  

After the bisect, we found the commit 6cc06ad06967db4611138b4028df9ffacc6eb860 introduces the slowdown.

range: 5fa4505d1b035731ed76c4b3440f411766040447..6cc06ad06967db4611138b4028df9ffacc6eb860

model:

<img width="226" alt="Image" src="https://github.com/user-attachments/assets/f81f5737-919d-4203-8381-62546bcf3754" />

analysis:
```
[name: model_loading_uri Op: Unknown]: 622 / 662 : 93.95770392749245%
[name: session_initialization Op: Unknown]: 3737 / 4004 : 93.33166833166833%
[name: /Resize_fence_before Op: Resize]: 48 / 1 : 4800.0%
[name: /Resize_kernel_time Op: Resize]: 84187 / 84921 : 99.13566726722483%
[name: /Resize_fence_after Op: Resize]: 0 / 12 : 0.0%
[name: /Resize_1_fence_before Op: Resize]: 4 / 22 : 18.181818181818183%
[name: /Resize_1_kernel_time Op: Resize]: 634602 / 444980 : 142.61360061126342%
[name: /Resize_1_fence_after Op: Resize]: 0 / 0 : 0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 4 : 0.0%
[name: /Ceil_kernel_time Op: Ceil]: 259949 / 262544 : 99.01159424705955%
[name: /Ceil_fence_after Op: Ceil]: 10 / 0 : 0%
[name: SequentialExecutor::Execute Op: Unknown]: 1118971 / 924960 : 120.97506919218128%
[name: model_run Op: Unknown]: 1141098 / 950941 : 119.99671903935155%
```


### To reproduce

1. Download and unzip "model.zip".
2. Run the following script.

```python
import time
import onnxruntime
import numpy as np

# Set the random seed
np.random.seed(0)

onnx_model_path = 'model.onnx'

# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()

nth = 100000

# Warm-up inference to cache optimizations

input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)

# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):

    start_ns = time.perf_counter_ns()
    ort_session.run(None, input_data)
    end_ns = time.perf_counter_ns()

    total_time_ns += end_ns - start_ns

avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6

print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')
```

### Urgency

_No response_

### Platform

Linux

### OS Version

6.8.0

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.20.1

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

[model.zip](https://github.com/user-attachments/files/18433513/model.zip)

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] 40% slowdown in ONNX Resize Operator on CPU #23391

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] 40% slowdown in ONNX Resize Operator on CPU #23391

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions