ORT inference fails after upgrading model's opset from 20 to 21

### Describe the issue

### Overview
ONNX-Runtime fails with the following error after upgrading model opset from 20 to 21:
```
RuntimeError: Error in execution: Non-zero status code returned while running Reshape node. 
Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 
onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be 
reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}
```

Full log:
```
Traceback (most recent call last):
  File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 178, in <module>
    main(args)
  File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 81, in main  
    predicted_ids = model.generate(inp)[0]
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 704, in generate       
    init_tokens = self._retrieve_init_tokens(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1572, in _retrieve_init_tokens
    lang_ids = self.detect_language(
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1676, in detect_language
    logits = self(**inputs, decoder_input_ids=decoder_input_ids, use_cache=False).logits[:, -1]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\modeling_base.py", line 98, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 1306, in forward
    encoder_outputs = self.encoder(input_features=input_features, attention_mask=attention_mask)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\base.py", line 98, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 377, in forward
    self.session.run_with_iobinding(io_binding)
  File "C:\ModelOpt_Test\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 398, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running Reshape node. Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}
```

### Additional info
1. This issue is only observed in Windows, not Linux;
2. The error shows `Reshape` as having the issue, but the shape computation actually gets corrupted 3 layers before that, in the `Mul` layer:
```
/layers.0/self_attn/Mul_1                  ← Computes first dimension dynamically, ROOT CAUSE (produces 124647109376 instead of 6)
    ↓
/layers.0/self_attn/Unsqueeze_6     ← Adds dimension to Mul_1 output
    ↓  
/layers.0/self_attn/Concat_5             ← Concatenates inputs, creating shape: [124647109376, -1, 64]
    ↓
/layers.0/self_attn/Reshape_5          ← Uses Concat_5 output as shape parameter, ERROR REPORTED HERE
```

3. **System info**:
  - GPU: GeForce RTX 5080 16 GiB
    - Also able to reproduce on RTX 4090 GPU
  - CPU: AMD Ryzen 7 7800X3D 8-Core Processor (x86_64)
    - Also able to repro with Ryzen 9 CPU


### To reproduce

## 1. Export model
A. Install requirements:
```sh
cd C:\ModelOpt-Test
.\venv\Scripts\Activate.ps1
pip install torch==2.7.0 torch-geometric==2.7.0 torchaudio==2.7.0 torchprofile==0.0.4 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
git clone https://github.com/NVIDIA/Model-Optimizer
cd .\Model-Optimizer\
pip install -r examples\windows\onnx_ptq\whisper\requirements.txt
pip uninstall torchvision
pip install transformers==4.49
```

B. Export model:
```sh
optimum-cli export onnx -m openai/whisper-tiny --dtype fp16 --library-name transformers --device cuda --opset 20 ./whisper-tiny
```

C. Run ORT inference on model (**SUCCESS**):
```
pip install transformers==4.57.3
python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
    --model_name openai/whisper-tiny \
    --onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny" \
    --audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
    --run_wer_test \
    --dtype fp16
```

## 2. Upgrade model's opset
A. Use [convert_opset.py](https://github.com/user-attachments/files/24774444/convert_opset.py):
```sh
mkdir .\whisper-tiny\converted_opset21\
cp .\whisper-tiny\*.json .\whisper-tiny\converted_opset21\                                                                       
python .\convert_opset.py .\whisper-tiny\encoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21                                                                
python .\convert_opset.py .\whisper-tiny\decoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21
python .\convert_opset.py .\whisper-tiny\decoder_with_past_model.onnx .\whisper-tiny\converted_opset21\decoder_with_past_model_opset21.onnx --opset 21
```

B. Run ORT inference on upgraded model (**FAILS**)
```
python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
    --model_name openai/whisper-tiny \
    --onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny\converted_opset21" \
    --audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
    --run_wer_test \
    --dtype fp16
```

### Urgency

Yes, this is blocking the Windows path in [NVIDIA's ModelOpt toolkit](https://github.com/NVIDIA/Model-Optimizer).

### Platform

Windows

### OS Version

11 (x86_64-standard-uefi)

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.23.2

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT inference fails after upgrading model's opset from 20 to 21 #27102

Describe the issue

Overview

Additional info

To reproduce

1. Export model

2. Upgrade model's opset

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ORT inference fails after upgrading model's opset from 20 to 21 #27102

Description

Describe the issue

Overview

Additional info

To reproduce

1. Export model

2. Upgrade model's opset

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions