-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Describe the issue
I am encountering an issue where the output of the model after optimization using ONNX Runtime is inconsistent with the original model.
- Actual Behavior:
AssertionError:
Not equal to tolerance rtol=0.001, atol=0.001
Mismatched elements: 12 / 1056 (1.14%)
Max absolute difference: 0.00350001
Max relative difference: 0.07692306
x: array([0.42 , 0.42 , 0.42 , ..., 0.1155, 0.126 , 0.084 ], dtype=float32)
y: array([0.42 , 0.42 , 0.42 , ..., 0.1155, 0.126 , 0.084 ], dtype=float32)- Expected Behavior:
The optimized model should produce identical results for all outputs when compared to the original model, within the specified tolerance.
To reproduce
- Download the model
- run the following script:
import onnx
import onnxruntime as ort
from onnxruntime.transformers import optimizer
import numpy as np
model_path = "20730.onnx"
optimized_model_path = f"./opt.onnx"
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
original_session = ort.InferenceSession(model_path, sess_options, providers=["CUDAExecutionProvider"])
input_data = {"v5_0": np.random.rand(55, 7, 1, 40).astype(np.float32)}
original_output_names = [output.name for output in original_session.get_outputs()]
original_result = original_session.run(original_output_names, input_data)
original_result2 = original_session.run(original_output_names, input_data)
for r1, r2 in zip(original_result, original_result2):
np.testing.assert_allclose(r1, r2, rtol=1e-3, atol=1e-3)
optimized_model = optimizer.optimize_model(model_path, opt_level=99)
optimized_model.save_model_to_file(optimized_model_path)
optimized_session = ort.InferenceSession(optimized_model_path, providers=["CUDAExecutionProvider"])
optimized_output_names = [output.name for output in optimized_session.get_outputs()]
optimized_result = optimized_session.run(optimized_output_names, input_data)
for r1, r2 in zip(original_result, optimized_result):
np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
model:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot