[quantization] Microscaling (MX) Quantization for LayerNorm in Qwen3-vl by Torrero · Pull Request #723 · Samsung/TICO

Torrero · 2026-05-22T18:15:19Z

What

Let's evaluate microscaling (MX) Quantization for LayerNorm in Qwen3-VL Vision Model

Why

Microscaling quantization can improve LayerNorm quantization accuracy when applied selectively to the right observers with appropriate axis configuration. The best results were achieved with:

Observers: act_in, centered, square, inv_std, norm, act_outs
Axis: 1 (channel dimension)

Mode	MX Observers	MX Axes	PPL	VQA2	MMLU	COCO(CIDEr/Bleu_4)	MMMU_pro(vision)
original	-	-	10.54	0.895	0.735	0.361/0.025	0.286
GPTQ_MSE_w4A16 token embedding, lm_head: 4bit patch embedding (Conv3D): 4bit	act_in, centered, square, inv_std, norm, act_outs	1	14.59	0.837	-	-	-
degradation %	-	-	38%	6%	-	-	-
GPTQ_MSE_w4A16 token embedding, lm_head: 8bit patch embedding (Conv3D): 8bit	act_in, centered, square, inv_std, norm, act_outs	1	13.68	0.829	-	-	-
degradation %	-	-	29%	6%	-	-	-
GPTQ_MSE_spinquant_w4A16 token embedding, lm_head: 8bit patch embedding (Conv3D): 8bit	act_in, centered, square, inv_std, norm, act_outs	1	12.13	0.878	0.702	0.328/0.024	0.249
degradation %	-	-	15%	2%	3%	9%/4%	4%
GPTQ_MSE_spinquant_smootquant_vision_w4A16 token embedding, lm_head: 8bit patch embedding (Conv3D): 8bit	act_in, centered, square, inv_std, norm, act_outs	1	12.00	0.888	0.706	0.317/0.021	0.262
degradation %	-	-	14%	1%	3%	12%/16%	4%

Note: Please keep in mind that Axis:1 may lead to additional computational costs.

Run commands:

#GPTQ_MSE_spinquant_w4A16
python tico/quantization/wrapq/examples/quantize_qwen3_vl_with_gptq.py --model=Qwen/Qwen3-VL-4B-Instruct  --trust-remote-code --calib_seq_len=2048 --max_seq_len=2048 --eval_tasks=vqav2,coco --gptq_mse=mse --nsamples_for_evaluation=1000 --nsamples_for_qcalibration=128 --embedding_weight_bits=8 --vision_patch_embed_weight_bits=8 --linear_weight_bits=4 --lm_head_weight_bits=8 --spinquant --spinquant_init_method=random --ppl_dataset=wikitext2 --ppl_stride=2048 --mmmu_dataset=MMMU/MMMU_Pro --mmmu_subjects=vision --mmmu_n_shots=0  --mmmu_n_samples=-1 --mmlu_subjects=mmlu --mmlu_n_samples=1000

#GPTQ_MSE_spinquant_smootquant_vision_w4A16 
python tico/quantization/wrapq/examples/quantize_qwen3_vl_with_gptq.py --model=Qwen/Qwen3-VL-4B-Instruct  --trust-remote-code --calib_seq_len=2048 --max_seq_len=2048 --eval_tasks=vqav2,coco --gptq_mse=mse --nsamples_for_evaluation=1000 --nsamples_for_qcalibration=128 --embedding_weight_bits=8 --vision_patch_embed_weight_bits=8 --linear_weight_bits=4 --lm_head_weight_bits=8 --spinquant --spinquant_init_method=random --ppl_dataset=wikitext2 --ppl_stride=2048 --mmmu_dataset=MMMU/MMMU_Pro --mmmu_subjects=vision --mmmu_n_shots=0  --mmmu_n_samples=-1 --mmlu_subjects=mmlu --mmlu_n_samples=1000

TICO-DCO-1.0-Signed-off-by: Evgenii Maltsev e.maltsev@samsung.com

…VL Vision Model Evaluation of microscaling (MX) Quantization for LayerNorm in Qwen3-VL Vision Model TICO-DCO-1.0-Signed-off-by: Evgenii Maltsev <e.maltsev@samsung.com>

[quantization] Microscaling (MX) Quantization for LayerNorm in Qwen3-…

51d366a

…VL Vision Model Evaluation of microscaling (MX) Quantization for LayerNorm in Qwen3-VL Vision Model TICO-DCO-1.0-Signed-off-by: Evgenii Maltsev <e.maltsev@samsung.com>

Torrero force-pushed the mx_for_layernorm_qwen branch from 28c7da3 to 51d366a Compare May 22, 2026 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Microscaling (MX) Quantization for LayerNorm in Qwen3-vl#723

[quantization] Microscaling (MX) Quantization for LayerNorm in Qwen3-vl#723
Torrero wants to merge 1 commit into
Samsung:mainfrom
Torrero:mx_for_layernorm_qwen

Torrero commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Torrero commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Torrero commented May 22, 2026 •

edited

Loading