-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.regressionissues that demonstrate a regression in ORT functionality and need to be addressed immediatelyissues that demonstrate a regression in ORT functionality and need to be addressed immediately
Description
Describe the issue
Overview
ONNX-Runtime fails with the following error after upgrading model opset from 20 to 21:
RuntimeError: Error in execution: Non-zero status code returned while running Reshape node.
Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41
onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be
reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}
Full log:
Traceback (most recent call last):
File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 178, in <module>
main(args)
File "C:\ModelOpt_Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py", line 81, in main
predicted_ids = model.generate(inp)[0]
^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 704, in generate
init_tokens = self._retrieve_init_tokens(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1572, in _retrieve_init_tokens
lang_ids = self.detect_language(
^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1676, in detect_language
logits = self(**inputs, decoder_input_ids=decoder_input_ids, use_cache=False).logits[:, -1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\modeling_base.py", line 98, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 1306, in forward
encoder_outputs = self.encoder(input_features=input_features, attention_mask=attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\base.py", line 98, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ModelOpt_Test\venv\Lib\site-packages\optimum\onnxruntime\modeling_seq2seq.py", line 377, in forward
self.session.run_with_iobinding(io_binding)
File "C:\ModelOpt_Test\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 398, in run_with_iobinding
self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running Reshape node. Name:'/layers.0/self_attn/Reshape_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,6,1500,64}, requested shape:{124647109376,-1,64}
Additional info
- This issue is only observed in Windows, not Linux;
- The error shows
Reshapeas having the issue, but the shape computation actually gets corrupted 3 layers before that, in theMullayer:
/layers.0/self_attn/Mul_1 ← Computes first dimension dynamically, ROOT CAUSE (produces 124647109376 instead of 6)
↓
/layers.0/self_attn/Unsqueeze_6 ← Adds dimension to Mul_1 output
↓
/layers.0/self_attn/Concat_5 ← Concatenates inputs, creating shape: [124647109376, -1, 64]
↓
/layers.0/self_attn/Reshape_5 ← Uses Concat_5 output as shape parameter, ERROR REPORTED HERE
- System info:
- GPU: GeForce RTX 5080 16 GiB
- Also able to reproduce on RTX 4090 GPU
- CPU: AMD Ryzen 7 7800X3D 8-Core Processor (x86_64)
- Also able to repro with Ryzen 9 CPU
To reproduce
1. Export model
A. Install requirements:
cd C:\ModelOpt-Test
.\venv\Scripts\Activate.ps1
pip install torch==2.7.0 torch-geometric==2.7.0 torchaudio==2.7.0 torchprofile==0.0.4 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
git clone https://github.com/NVIDIA/Model-Optimizer
cd .\Model-Optimizer\
pip install -r examples\windows\onnx_ptq\whisper\requirements.txt
pip uninstall torchvision
pip install transformers==4.49B. Export model:
optimum-cli export onnx -m openai/whisper-tiny --dtype fp16 --library-name transformers --device cuda --opset 20 ./whisper-tinyC. Run ORT inference on model (SUCCESS):
pip install transformers==4.57.3
python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
--model_name openai/whisper-tiny \
--onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny" \
--audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
--run_wer_test \
--dtype fp16
2. Upgrade model's opset
A. Use convert_opset.py:
mkdir .\whisper-tiny\converted_opset21\
cp .\whisper-tiny\*.json .\whisper-tiny\converted_opset21\
python .\convert_opset.py .\whisper-tiny\encoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21
python .\convert_opset.py .\whisper-tiny\decoder_model.onnx .\whisper-tiny\converted_opset21\decoder_model_opset21.onnx --opset 21
python .\convert_opset.py .\whisper-tiny\decoder_with_past_model.onnx .\whisper-tiny\converted_opset21\decoder_with_past_model_opset21.onnx --opset 21B. Run ORT inference on upgraded model (FAILS)
python .\examples\windows\onnx_ptq\whisper\whisper_optimum_ort_inference.py \
--model_name openai/whisper-tiny \
--onnx_model_dir "C:\ModelOpt-Test\Model-Optimizer\whisper-tiny\converted_opset21" \
--audio_file_path "C:\ModelOpt-Test\Model-Optimizer\examples\windows\onnx_ptq\whisper\demo.wav" \
--run_wer_test \
--dtype fp16
Urgency
Yes, this is blocking the Windows path in NVIDIA's ModelOpt toolkit.
Platform
Windows
OS Version
11 (x86_64-standard-uefi)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.regressionissues that demonstrate a regression in ORT functionality and need to be addressed immediatelyissues that demonstrate a regression in ORT functionality and need to be addressed immediately