Skip to content

Activation Equalization co-optimize flag #887

Open
@Giuseppe5

Description

@Giuseppe5

When performing activation equalization, we have a co-optimize flag that is enabled for llm examples.

The idea is the following:
During activation equalization with FX, the equalization scaling factors are merged in the previous and subsequent layer.

Although the subsequent layer contributes to the computation of the scale factor, the previous one has no impact on the final value of the scales. This might result in the previous layer being highly disrupted by the equalization process.

To alleviate this, the co-optimize flag allows the previous layer's contributions to be weighted in during the smoothquant computation.

This is very experimental and might require a more in-depth analysis.

To reproduce, quantize OPT-125m with activation equalization (FX), with/without the co-optimize flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions