Skip to content

Commit 148a178

Browse files
committed
[modifier/awq] add output_mse_shrinkage: per-group clipping optimization via activation projection
AWQ's grid search optimizes scale (alpha) against output MSE, but the quantization clipping range (shrinkage) is determined independently by the observer using weight MSE. These objectives are misaligned. This commit adds output_mse_shrinkage: per-group shrinkage optimization using the same output MSE objective as the scale search. For each scale candidate, the best clipping factor p is selected per quantization group by minimizing the activation-projected weight quantization error: w_err = W_quant - W_scaled (out_ch, G, group_size) out_err = einsum('ogs,ngs->ogn', w_err, X_grouped) (out_ch, G, n_tokens) err = out_err^2.sum(n) (out_ch, G) X is collected from real calibration samples via a forward hook on the balance layer, so the error reflects actual token distributions rather than a proxy. Each group independently selects its optimal p, allowing aggressive clipping where activations are small and conservative clipping where they are large. New parameters: n_shrink_grid: int = 1 (1 = disabled, backward compatible) maxshrink: float = 0.20 (search range: p in [1-maxshrink, 1.0]) Benchmarked on Llama-3.1-8B-Instruct W4A16 ASYM group=128, open-platypus calibration, WikiText-2 eval: Baseline (n_shrink_grid=1): PPL 9.995 output_mse_shrinkage (n=10): PPL 9.890 (-0.105) ← best output_mse_shrinkage (n=50): PPL 9.953 (-0.042) output_mse_shrinkage (n=100): PPL 9.941 (-0.054) All n values improve over baseline. n=10 gives best result; improvement is not strictly monotonic, suggesting diminishing returns from finer shrinkage resolution on calibration data. Implementation notes: - Chunked einsum over out_ch to bound peak memory (~256MB per chunk) - Activation samples capped at 2048 tokens to prevent OOM on large models - 5 new unit tests Part of #2479 Signed-off-by: David Zheng <dqzheng1996@gmail.com>
1 parent 2bbf6e4 commit 148a178

File tree

2 files changed

+310
-44
lines changed

2 files changed

+310
-44
lines changed

0 commit comments

Comments
 (0)