[Performance] Performance regression in Where operator with broadcast between v1.20.0 and v1.21.0

### Describe the issue

## Summary
- Operator: Where (opset 16, float32)
- Scope: Broadcasted inputs only (shapes differ)
- Impact: Kernel time slower by ~4–11% (validated, persists to v1.23.0)

## What regresses
- Case: condition [2,64,56,56], X/Y [1,64,1,1] → kernel +3.97%
- Validation data: kernel_regression_pct ≈ 8.9% (confirmed, persists)

## What does not regress
- Same-shape inputs ([2,64,56,56] for condition/X/Y) improved in v1.21.0
- Mixed broadcast case (condition small, X/Y large) did not regress in our run

## Takeaway
- Regression is specific to broadcast-from-small-to-large tensors; same-shape paths are optimized.


### To reproduce


1. Download zip file

[Archive.zip](https://github.com/user-attachments/files/24821741/Archive.zip)

2. Run benchmark using the provided script:
   ```bash
   python script_profiling.py where_16_v2_where_float32_4d_broadcast_condition 1.20.0 1.21.0
   ```

### Urgency

_No response_

### Platform

Linux

### OS Version

Ubuntu 24.04.3 LTS

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.21

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Performance regression in Where operator with broadcast between v1.20.0 and v1.21.0 #27116

Describe the issue

Summary

What regresses

What does not regress

Takeaway

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Performance regression in Where operator with broadcast between v1.20.0 and v1.21.0 #27116

Description

Describe the issue

Summary

What regresses

What does not regress

Takeaway

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions