Checklist
- I have searched related issues but cannot get the expected help.
- I have read related documents and don't know what to do.
Describe the question you meet
Hi, I'm trying to export a quantized model after an QAT experiment. I have implemented the following hook to export it into ONNX format using the get_deploy_model method of MMArchitectureQuant.
def after_train(self, runner) -> None:
"""Export the quantized model to onnx format
Args:
runner (Runner): The runner of the training, validation or testing
process.
"""
try:
check_torch_version()
except AssertionError as err:
print(repr(err))
return
if runner.distributed:
quantized_model = runner.model.module.get_deploy_model(self.mode)
else:
quantized_model = runner.model.get_deploy_model(self.mode)
quantized_model.eval()
dataset_type = runner.cfg.get("dataset_type")
if dataset_type == CityscapesDataset:
input = torch.randn(1, 3, 512, 1024)
elif dataset_type == CocoDataset:
input = torch.randn(1, 3, 800, 1333)
elif dataset_type == ImageNet:
input = torch.randn(1, 3, 224, 224)
else:
raise TypeError(f"Dataset type {dataset_type} is not supported yet. You can add it in the above code lines.")
torch.onnx.export(quantized_model, input,
runner.work_dir + '/quantized_model.onnx',
input_names=["images"], output_names=["output"],
operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
verbose=True,
do_constant_folding=True,
opset_version=17,
dynamic_axes=None)
print(f"Export of quantized model to onnx completed, save to {runner.work_dir + '/quantized_model.onnx'}")
In this built-in method, the post_process_for_deploy from the NativeQuantizer is applied and seems to process specifically the weight FakeQuantize modules. Moreover, the end of the get_deploy_model method of MMArchitectureQuant seems to give a specific postprocess to activation FakeQuantize module by copying it and changing the nature of the Torch class used (?).
Then, when I visualize my ONNX on Netron, the traduction made for activation and weight FakeQuantize modules are different. This is problematic for execution on specific hardware as the traduction used for weight FakeQuantize modules is not recognized.

How can I get the same ONNX QuantizeLinear+DequantizeLinear layers for both activation and weight FakeQuantize modules?
Post related information
Here is my quantization configuration used with OpenVINOQuantizer:
global_qconfig = dict(
w_observer=dict(type='PerChannelMinMaxObserver'),
a_observer=dict(type='MovingAverageMinMaxObserver'),
w_fake_quant=dict(type='FakeQuantize'),
a_fake_quant=dict(type='FakeQuantize'),
w_qscheme=dict(
qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
)
Checklist
Describe the question you meet
Hi, I'm trying to export a quantized model after an QAT experiment. I have implemented the following hook to export it into ONNX format using the
get_deploy_modelmethod ofMMArchitectureQuant.In this built-in method, the
post_process_for_deployfrom theNativeQuantizeris applied and seems to process specifically the weightFakeQuantizemodules. Moreover, the end of theget_deploy_modelmethod ofMMArchitectureQuantseems to give a specific postprocess to activationFakeQuantizemodule by copying it and changing the nature of the Torch class used (?).Then, when I visualize my ONNX on Netron, the traduction made for activation and weight
FakeQuantizemodules are different. This is problematic for execution on specific hardware as the traduction used for weightFakeQuantizemodules is not recognized.How can I get the same ONNX QuantizeLinear+DequantizeLinear layers for both activation and weight
FakeQuantizemodules?Post related information
Here is my quantization configuration used with
OpenVINOQuantizer: