Commit 148a178

committed

[modifier/awq] add output_mse_shrinkage: per-group clipping optimization via activation projection

AWQ's grid search optimizes scale (alpha) against output MSE, but the quantization clipping range (shrinkage) is determined independently by the observer using weight MSE. These objectives are misaligned. This commit adds output_mse_shrinkage: per-group shrinkage optimization using the same output MSE objective as the scale search. For each scale candidate, the best clipping factor p is selected per quantization group by minimizing the activation-projected weight quantization error: w_err = W_quant - W_scaled (out_ch, G, group_size) out_err = einsum('ogs,ngs->ogn', w_err, X_grouped) (out_ch, G, n_tokens) err = out_err^2.sum(n) (out_ch, G) X is collected from real calibration samples via a forward hook on the balance layer, so the error reflects actual token distributions rather than a proxy. Each group independently selects its optimal p, allowing aggressive clipping where activations are small and conservative clipping where they are large. New parameters: n_shrink_grid: int = 1 (1 = disabled, backward compatible) maxshrink: float = 0.20 (search range: p in [1-maxshrink, 1.0]) Benchmarked on Llama-3.1-8B-Instruct W4A16 ASYM group=128, open-platypus calibration, WikiText-2 eval: Baseline (n_shrink_grid=1): PPL 9.995 output_mse_shrinkage (n=10): PPL 9.890 (-0.105) ← best output_mse_shrinkage (n=50): PPL 9.953 (-0.042) output_mse_shrinkage (n=100): PPL 9.941 (-0.054) All n values improve over baseline. n=10 gives best result; improvement is not strictly monotonic, suggesting diminishing returns from finer shrinkage resolution on calibration data. Implementation notes: - Chunked einsum over out_ch to bound peak memory (~256MB per chunk) - Activation samples capped at 2048 tokens to prevent OOM on large models - 5 new unit tests Part of #2479 Signed-off-by: David Zheng <dqzheng1996@gmail.com>

1 parent 2bbf6e4 commit 148a178Copy full SHA for 148a178

2 files changed

+310

-44

lines changed

src/llmcompressor/modifiers/awq
- base.py
tests/llmcompressor/modifiers/awq
- test_base.py

2 files changed

+310

-44

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 148a178

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments