Skip to content

ORT inference fails after upgrading model's opset from 20 to 21 #27102

@gcunhase

Description

@gcunhase

Describe the issue

Overview

ONNX-Runtime fails with the following error after upgrading model opset from 20 to 21:

RuntimeError: Error in execution: Non-zero status code returned while running Reshape node. 
Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 
onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be 
reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}

Full log:

Traceback (most recent call last):
  File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 178, in <module>
    main(args)
  File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 81, in main  
    predicted_ids = model.generate(inp)[0]
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 704, in generate       
    init_tokens = self._retrieve_init_tokens(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1572, in _retrieve_init_tokens
    lang_ids = self.detect_language(
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1676, in detect_language
    logits = self(**inputs, decoder_input_ids=decoder_input_ids, use_cache=False).logits[:, -1]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\modeling_base.py", line 98, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 1306, in forward
    encoder_outputs = self.encoder(input_features=input_features, attention_mask=attention_mask)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\base.py", line 98, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 377, in forward
    self.session.run_with_iobinding(io_binding)
  File "C:\ModelOpt_Test\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 398, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running Reshape node. Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}

Additional info

  1. This issue is only observed in Windows, not Linux;
  2. The error shows Reshape as having the issue, but the shape computation actually gets corrupted 3 layers before that, in the Mul layer:
/layers.0/self_attn/Mul_1                  ← Computes first dimension dynamically, ROOT CAUSE (produces 124647109376 instead of 6)
    ↓
/layers.0/self_attn/Unsqueeze_6     ← Adds dimension to Mul_1 output
    ↓  
/layers.0/self_attn/Concat_5             ← Concatenates inputs, creating shape: [124647109376, -1, 64]
    ↓
/layers.0/self_attn/Reshape_5          ← Uses Concat_5 output as shape parameter, ERROR REPORTED HERE
  1. System info:
  • GPU: GeForce RTX 5080 16 GiB
    • Also able to reproduce on RTX 4090 GPU
  • CPU: AMD Ryzen 7 7800X3D 8-Core Processor (x86_64)
    • Also able to repro with Ryzen 9 CPU

To reproduce

1. Export model

A. Install requirements:

cd C:\ModelOpt-Test
.\venv\Scripts\Activate.ps1
pip install torch==2.7.0 torch-geometric==2.7.0 torchaudio==2.7.0 torchprofile==0.0.4 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
git clone https://github.com/NVIDIA/Model-Optimizer
cd .\Model-Optimizer\
pip install -r examples\windows\onnx_ptq\whisper\requirements.txt
pip uninstall torchvision
pip install transformers==4.49

B. Export model:

optimum-cli export onnx -m openai/whisper-tiny --dtype fp16 --library-name transformers --device cuda --opset 20 ./whisper-tiny

C. Run ORT inference on model (SUCCESS):

pip install transformers==4.57.3
python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
    --model_name openai/whisper-tiny \
    --onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny" \
    --audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
    --run_wer_test \
    --dtype fp16

2. Upgrade model's opset

A. Use convert_opset.py:

mkdir .\whisper-tiny\converted_opset21\
cp .\whisper-tiny\*.json .\whisper-tiny\converted_opset21\                                                                       
python .\convert_opset.py .\whisper-tiny\encoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21                                                                
python .\convert_opset.py .\whisper-tiny\decoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21
python .\convert_opset.py .\whisper-tiny\decoder_with_past_model.onnx .\whisper-tiny\converted_opset21\decoder_with_past_model_opset21.onnx --opset 21

B. Run ORT inference on upgraded model (FAILS)

python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
    --model_name openai/whisper-tiny \
    --onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny\converted_opset21" \
    --audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
    --run_wer_test \
    --dtype fp16

Urgency

Yes, this is blocking the Windows path in NVIDIA's ModelOpt toolkit.

Platform

Windows

OS Version

11 (x86_64-standard-uefi)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.regressionissues that demonstrate a regression in ORT functionality and need to be addressed immediately

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions