I am trying to convert my model to onnxruntime, and my model itself has been int8 quantized. During the runtime, the following error occurred
NOT_IMPLEMENTED : Could not find an implementation for Add(14) node with name '/self_attn/q_proj/Add'

During the debugging process, I found that after quantization, the linear layer became the QuantLinear layer, where qweight was int32 and bias was float16. Perhaps the error was caused by the different types of the two layers
I passed the model loading check when loading the model, as shown in the following code,it not have error report
Does the onnx model not support converting pytorch's quantization model? If supported, what do I need to do?