Activation Equalization co-optimize flag

When performing activation equalization, we have a `co-optimize` flag that is enabled for llm examples.

The idea is the following:
During activation equalization with FX, the equalization scaling factors are merged in the previous and subsequent layer.

Although the subsequent layer contributes to the computation of the scale factor, the previous one has no impact on the final value of the scales. This might result in the previous layer being highly disrupted by the equalization process.

To alleviate this, the co-optimize flag allows the previous layer's contributions to be weighted in during the smoothquant computation.

This is very experimental and might require a more in-depth analysis.

To reproduce, quantize OPT-125m with activation equalization (FX), with/without the co-optimize flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation Equalization co-optimize flag #887

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Activation Equalization co-optimize flag #887

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions