[Performance] Performance regression in Einsum operator for int64 data type between v1.22.0 and v1.23.0

### Describe the issue

## Description

We observed a performance regression in the **Einsum** operator when using **int64 data type** inputs between ONNXRuntime v1.22.0 and v1.23.0. This regression is **specific to int64 type** - other data types (e.g., int32) are not affected.

## Affected Operator

### Einsum
- **Opset Version**: 12
- **Data Type**: int64 (regressed)
- **Regression**: +14% kernel slowdown

## Test Case Details

### Test Case: `einsum_12_v2_einsum_elementwise_multiplication_broadcast_int64`

**Inputs:**
- **input_0** tensor:
  - Data type: **int64** (type=7)
  - Shape: [2, 64, 56, 56]

- **input_1** tensor:
  - Data type: **int64** (type=7)
  - Shape: [64]

**Attributes:**
- **equation**: "nchw,c->nchw" (elementwise multiplication with broadcast)

**Output:**
- Data type: int64
- Shape: [2, 64, 56, 56]
- Elementwise multiplication with channel broadcast

**Performance:**
- v1.22.0: 26.62 ms (kernel time)
- v1.23.0: 30.45 ms (kernel time)
- **Kernel regression: +14.4% slowdown**
- **Total time regression: +23.9% slowdown**


### To reproduce

1. Download zip file

[Archive.zip](https://github.com/user-attachments/files/24860073/Archive.zip)

2. Run benchmark using the provided script:
   ```bash
   python script_profiling.py einsum_12_v2_einsum_elementwise_multiplication_broadcast_int64 1.22.0 1.23.0
   ```

### Urgency

_No response_

### Platform

Linux

### OS Version

Ubuntu 24.04.3 LTS

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.21

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Performance regression in Einsum operator for int64 data type between v1.22.0 and v1.23.0 #27154

Describe the issue

Description

Affected Operator

Einsum

Test Case Details

Test Case: `einsum_12_v2_einsum_elementwise_multiplication_broadcast_int64`

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Performance regression in Einsum operator for int64 data type between v1.22.0 and v1.23.0 #27154

Description

Describe the issue

Description

Affected Operator

Einsum

Test Case Details

Test Case: einsum_12_v2_einsum_elementwise_multiplication_broadcast_int64

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Test Case: `einsum_12_v2_einsum_elementwise_multiplication_broadcast_int64`