Problematic difference between FakeQuantize for activations and weights

### Checklist

- I have searched related issues but cannot get the expected help.
- I have read related documents and don't know what to do.

### Describe the question you meet

Hi, I'm trying to export a quantized model after an QAT experiment. I have implemented the following hook to export it into ONNX format using the [`get_deploy_model`](https://github.com/open-mmlab/mmrazor/blob/90c7af1fdf35a606a2fceaaeb4b6be2f0dac4eb7/mmrazor/models/algorithms/quantization/mm_architecture.py#L347) method of `MMArchitectureQuant`.

```Python
def after_train(self, runner) -> None:
    """Export the quantized model to onnx format

    Args:
        runner (Runner): The runner of the training, validation or testing
            process.
    """
    try:
        check_torch_version()
    except AssertionError as err:
        print(repr(err))
        return
    
    if runner.distributed:
        quantized_model = runner.model.module.get_deploy_model(self.mode)
    else:
        quantized_model = runner.model.get_deploy_model(self.mode)
    quantized_model.eval()

    dataset_type = runner.cfg.get("dataset_type")
    if dataset_type == CityscapesDataset:
        input = torch.randn(1, 3, 512, 1024)
    elif dataset_type == CocoDataset:
        input = torch.randn(1, 3, 800, 1333)
    elif dataset_type == ImageNet:
        input = torch.randn(1, 3, 224, 224)
    else:
        raise TypeError(f"Dataset type {dataset_type} is not supported yet. You can add it in the above code lines.")
    
    torch.onnx.export(quantized_model, input,
                        runner.work_dir + '/quantized_model.onnx',
                        input_names=["images"], output_names=["output"],
                        operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
                        verbose=True,
                        do_constant_folding=True,
                        opset_version=17,
                        dynamic_axes=None)
    
    print(f"Export of quantized model to onnx completed, save to {runner.work_dir + '/quantized_model.onnx'}")
```

In this built-in method, the [`post_process_for_deploy`](https://github.com/open-mmlab/mmrazor/blob/90c7af1fdf35a606a2fceaaeb4b6be2f0dac4eb7/mmrazor/models/quantizers/native_quantizer.py#L253) from the `NativeQuantizer` is applied and seems to process specifically the weight `FakeQuantize` modules. Moreover, the end of the [`get_deploy_model`](https://github.com/open-mmlab/mmrazor/blob/90c7af1fdf35a606a2fceaaeb4b6be2f0dac4eb7/mmrazor/models/algorithms/quantization/mm_architecture.py#L347) method of `MMArchitectureQuant` seems to give a specific postprocess to activation `FakeQuantize` module by copying it and changing the nature of the Torch class used (?).

Then, when I visualize my ONNX on Netron, the traduction made for activation and weight `FakeQuantize` modules are different. This is problematic for execution on specific hardware as the traduction used for weight `FakeQuantize` modules is not recognized.

![image](https://github.com/user-attachments/assets/1eca87b5-63ac-4e69-84c5-8825a25d12c6)

How can I get the same ONNX QuantizeLinear+DequantizeLinear layers for both activation and weight `FakeQuantize` modules?

### Post related information

Here is my quantization configuration used with [`OpenVINOQuantizer`](https://github.com/open-mmlab/mmrazor/blob/main/mmrazor/models/quantizers/openvino_quantizer.py):

```python
global_qconfig = dict(
    w_observer=dict(type='PerChannelMinMaxObserver'),
    a_observer=dict(type='MovingAverageMinMaxObserver'),
    w_fake_quant=dict(type='FakeQuantize'),
    a_fake_quant=dict(type='FakeQuantize'),
    w_qscheme=dict(
        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
    a_qscheme=dict(qdtype='quint8', bit=8, is_symmetry=True),
)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problematic difference between FakeQuantize for activations and weights #651

Checklist

Describe the question you meet

Post related information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problematic difference between FakeQuantize for activations and weights #651

Description

Checklist

Describe the question you meet

Post related information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions