Static quantization but want only float output - is it possible?

I have a model that I want to statically quantize for all the benefits that brings. However, the quality suffered when I did so. First I excluded sensitive layers, but that wasn't enough. Then I had it suggested that I should keep float input/output and exclude the first and last x layers.
I have been trying to do this, but I cannot get rid of a quantize op at the end, which then causes a crash at runtime when allocating output buffers.

<img width="361" height="933" alt="Image" src="https://github.com/user-attachments/assets/f17fd577-e698-4834-b6a1-195e542eb274" />

My recipe excludes all the last layers (plus more, unshown):
    # Exclude output/final layers (exact tensor names)
    rp_manager.add_quantization_config(
        regex='.*Linear_projector;1',
        operation_name=qtyping.TFLOperationName.ALL_SUPPORTED,
        algorithm_key=algorithm_manager.AlgorithmName.NO_QUANTIZE,
    )
    rp_manager.add_quantization_config(
        regex='.*WavLMForSequenceClassification;1',
        operation_name=qtyping.TFLOperationName.ALL_SUPPORTED,
        algorithm_key=algorithm_manager.AlgorithmName.NO_QUANTIZE,
    )
    rp_manager.add_quantization_config(
        regex='.*StatefulPartitionedCall.*',
        operation_name=qtyping.TFLOperationName.ALL_SUPPORTED,
        algorithm_key=algorithm_manager.AlgorithmName.NO_QUANTIZE,
    )
    rp_manager.add_quantization_config(
        regex='.*logits.*',
        operation_name=qtyping.TFLOperationName.ALL_SUPPORTED,
        algorithm_key=algorithm_manager.AlgorithmName.NO_QUANTIZE,
    )

But somehow this Quantize op is still there.  What am I missing?

Before all my exclusions, my recipe starts with this:
    # First: Quantize ALL supported ops to static int8 by default
    rp_manager.add_static_config(
        regex='.*',
        operation_name=qtyping.TFLOperationName.ALL_SUPPORTED,
        activation_num_bits=8,
        weight_num_bits=8,
        algorithm_key=algorithm_manager.AlgorithmName.MIN_MAX_UNIFORM_QUANT,
    )

What am I doing wrong, or is my aim not possible?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static quantization but want only float output - is it possible? #366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Static quantization but want only float output - is it possible? #366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions