Commit 148a178
committed
[modifier/awq] add output_mse_shrinkage: per-group clipping optimization via activation projection
AWQ's grid search optimizes scale (alpha) against output MSE, but the
quantization clipping range (shrinkage) is determined independently by
the observer using weight MSE. These objectives are misaligned.
This commit adds output_mse_shrinkage: per-group shrinkage optimization
using the same output MSE objective as the scale search. For each scale
candidate, the best clipping factor p is selected per quantization group
by minimizing the activation-projected weight quantization error:
w_err = W_quant - W_scaled (out_ch, G, group_size)
out_err = einsum('ogs,ngs->ogn',
w_err, X_grouped) (out_ch, G, n_tokens)
err = out_err^2.sum(n) (out_ch, G)
X is collected from real calibration samples via a forward hook on the
balance layer, so the error reflects actual token distributions rather
than a proxy. Each group independently selects its optimal p, allowing
aggressive clipping where activations are small and conservative clipping
where they are large.
New parameters:
n_shrink_grid: int = 1 (1 = disabled, backward compatible)
maxshrink: float = 0.20 (search range: p in [1-maxshrink, 1.0])
Benchmarked on Llama-3.1-8B-Instruct W4A16 ASYM group=128,
open-platypus calibration, WikiText-2 eval:
Baseline (n_shrink_grid=1): PPL 9.995
output_mse_shrinkage (n=10): PPL 9.890 (-0.105) ← best
output_mse_shrinkage (n=50): PPL 9.953 (-0.042)
output_mse_shrinkage (n=100): PPL 9.941 (-0.054)
All n values improve over baseline. n=10 gives best result; improvement
is not strictly monotonic, suggesting diminishing returns from finer
shrinkage resolution on calibration data.
Implementation notes:
- Chunked einsum over out_ch to bound peak memory (~256MB per chunk)
- Activation samples capped at 2048 tokens to prevent OOM on large models
- 5 new unit tests
Part of #2479
Signed-off-by: David Zheng <dqzheng1996@gmail.com>1 parent 2bbf6e4 commit 148a178
File tree
2 files changed
+310
-44
lines changed- src/llmcompressor/modifiers/awq
- tests/llmcompressor/modifiers/awq
2 files changed
+310
-44
lines changed
0 commit comments