Skip to content

Quantizer logging and summary info #7718

Open
@GregoryComer

Description

🚀 The feature, motivation and pitch

When quantizing models, I'd like to be able to easily see which operators were actually quantized, as well as how many. It's visible in the exported_program, but it can be very long and difficult to parse at a glance. When changing quantization scheme or doing selective quantization, it's easy to accidentally not quantize anything (or quantize less than expected) due to opaque feature/operator support in the quantization flow. Quantization can "succeed" but do nothing. Users also often just run the exact quantization scheme from the docs and don't have a good way to know what it did.

There are a few ways to solve this. The easiest might be to to just add logging into the quantizer, perhaps printing a summary at info level and warn if nothing is quantized. It could look something like this:

Quantization Summary (XNNPACK Quantizer):
 torch.nn.Linear (static, 4-bit symmetric, groupwise, gs=32): 10 instances
 torch.nn.Conv2d (static, 8-bit symmetric, per-tensor): 12 instances
 ...

Quantization parameter information should ideally include key parameters for the quantization scheme, such as static vs dynamic activations or weight only, weight nbits, symmetric vs asymmetric, and per-tensor / per-channel / groupwise (with size). It could also include activation quantization parameters, but these tend to vary less, and it may be better to keep the summary brief.

In the event that nothing is quantized, we should print a warning. Perhaps with module-level granularity, such that if you specify module-level qparams for nn.Linear and nn.Conv2d, but it only quantizes linears, it would warn that no Conv2ds were quantized.

[Warning] XNNPACK quantizer did not find any operators to quantize.
or
[Warning] XNNPACK quantizer did not find any Conv2d operators to quantize.

We should additionally include debug-level logging to note each quantized operator, as well as cases when an operator isn't quantized due to some constraint.

Alternatives

Users can dump the exported_program after conversion. However, the signal to noise ratio is high if they just want quantization info.

We could also provide a dedicated call to print the quantization summary, but I think logging by default is a better option, as users don't need to explicitly call it.

We could go further and provide a visual graph-level dump of the quantization, though starting with logging offers most of the benefit for much less development effort.

Additional context

No response

RFC (Optional)

No response

cc @kimishpatel @jerryzh168

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions