[Performance] kokoro onnx performance issues

### Describe the issue

Hello.
I'm trying to use a kokoro onnx model and I see there are a lot of performance difference between pytorch CPU and onnxruntime CPU (no special provider specified)

I've seen there is  <https://www.ui.perfetto.dev/> useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime

[onnxruntime_profile__2025-01-15_15-22-14.zip](https://github.com/user-attachments/files/18427124/onnxruntime_profile__2025-01-15_15-22-14.zip)

using the query `select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc` where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected ones

I'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue

### To reproduce
thanks @thewh1teagle 

```python
"""
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""

import soundfile as sf
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
    "Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")
```

### Urgency

Could you please suggest how to properly understand what are the performance problem of a model?
It's hard to me to find proper documentation that guide me to understanding deeply how things works, any link to docs/tutorials are apprecciated (remind I'm mainly a C# guy)

Thanks

### Platform

Windows

### OS Version

Windows 11

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

nuget package 1.20.1

### ONNX Runtime API

C#

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] kokoro onnx performance issues #23384

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] kokoro onnx performance issues #23384

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions