-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
quantizationissues related to quantizationissues related to quantizationstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Describe the issue
I want to implement the inference of onnx model in my own C code,but in some layers,the result between C and ONNX has 1 error, such as C is 40 but onnx is 41.
I want to know why numpy's result is -87 but onnx is -88 ? ?
In Quant model inference, an error of 1 is fatal!The cumulative error through many layers can reach 4-5 (in 8-bit integers)
Thank u :>
the test code below⬇
To reproduce
import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np
import onnxruntime as ort
A = 'A'
B = 'B'
C = 'C'
A_scale = 0.008010663092136383
A_zero_point = 7
B_scale = 0.00622713053599
B_zero_point = -128
C_scale = 0.006873490754514933
C_zero_point = -128
input_A = helper.make_tensor_value_info(A, TensorProto.INT8, [1, 1, 1, 1])
input_B = helper.make_tensor_value_info(B, TensorProto.INT8, [1, 1, 1, 1])
output = helper.make_tensor_value_info(C, TensorProto.INT8, [1, 1, 1, 1])
initializer_A_scale = numpy_helper.from_array(np.array(A_scale, dtype=np.float32), name='A_scale')
initializer_A_zero_point = numpy_helper.from_array(np.array(A_zero_point, dtype=np.int8), name='A_zero_point')
initializer_B_scale = numpy_helper.from_array(np.array(B_scale, dtype=np.float32), name='B_scale')
initializer_B_zero_point = numpy_helper.from_array(np.array(B_zero_point, dtype=np.int8), name='B_zero_point')
initializer_C_scale = numpy_helper.from_array(np.array(C_scale, dtype=np.float32), name='C_scale')
initializer_C_zero_point = numpy_helper.from_array(np.array(C_zero_point, dtype=np.int8), name='C_zero_point')
qlinear_add_node = helper.make_node(
'QLinearAdd',
inputs=[A, 'A_scale', 'A_zero_point', B, 'B_scale', 'B_zero_point', 'C_scale', 'C_zero_point'],
outputs=[C],
name='QLinearAdd',
domain='com.microsoft'
)
opset_version_ai_onnx = 13
opset_version_com_microsoft = 1
graph = helper.make_graph(
nodes=[qlinear_add_node],
name='QLinearAdd_Graph',
inputs=[input_A, input_B],
outputs=[output],
initializer=[
initializer_A_scale,
initializer_A_zero_point,
initializer_B_scale,
initializer_B_zero_point,
initializer_C_scale,
initializer_C_zero_point
]
)
model = helper.make_model(graph, producer_name='onnx-qlinearadd-fixed-params',
opset_imports=[ helper.make_opsetid(domain='ai.onnx', version=opset_version_ai_onnx),
helper.make_opsetid(domain='com.microsoft', version=opset_version_com_microsoft)])
onnx.save(model, 'qlinearadd_fixed_params_model.onnx')
print("ONNX MODEL save 'qlinearadd_fixed_params_model.onnx'")
A_int8 = np.array([-8], dtype=np.int8)
B_int8 = np.array([-64], dtype=np.int8)
A_real = A_scale * (A_int8.astype(np.int32) - A_zero_point)
B_real = B_scale * (B_int8.astype(np.int32) - B_zero_point)
C_real = A_real + B_real
A1 = A_scale *(A_int8 - A_zero_point)
B1 = B_scale*(B_int8 - B_zero_point)
print((A1+B1) / C_scale + C_zero_point )
C_int32 = np.round(C_real / C_scale) + C_zero_point
C_int8 = C_int32.astype(np.int8)
print(C_int8)
session = ort.InferenceSession('qlinearadd_fixed_params_model.onnx')
output_name = session.get_outputs()[0].name
A_data = np.array([-8], dtype=np.int8).reshape([1, 1, 1, 1])
B_data = np.array([-64], dtype=np.int8).reshape([1, 1, 1, 1])
input_dict = {
'A': A_data,
'B': B_data
}
outputs = session.run([output_name], input_dict)
C_output = outputs[0]
print("output C:", C_output)Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
onnxruntime==1.19.2 python
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
quantizationissues related to quantizationissues related to quantizationstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot