Skip to content

When using the int8 quantization model to convert to onnx, an error occurs during runtime #23879

Closed
@jungyin

Description

@jungyin

I am trying to convert my model to onnxruntime, and my model itself has been int8 quantized. During the runtime, the following error occurred
NOT_IMPLEMENTED : Could not find an implementation for Add(14) node with name '/self_attn/q_proj/Add'

Image

During the debugging process, I found that after quantization, the linear layer became the QuantLinear layer, where qweight was int32 and bias was float16. Perhaps the error was caused by the different types of the two layers

I passed the model loading check when loading the model, as shown in the following code,it not have error report

onnx.checker.check_model(self.model_path)

Does the onnx model not support converting pytorch's quantization model? If supported, what do I need to do?

Metadata

Metadata

Assignees

No one assigned

    Labels

    quantizationissues related to quantization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions