Release v0.4.7 · intel/auto-round

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

step-1 support naive double quant in tuning by @wenhuach21 in #442
fix critic bug of mxfp4 by @wenhuach21 in #451
update readme by @wenhuach21 in #455
update eval by @n1ck-guo in #450
awq exporting bugfix by @WeiweiZhang1 in #456
Support force loading into autoround Format by @WeiweiZhang1 in #453
20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
fixl eval bug by @n1ck-guo in #461
[STEP-1]W4Afp8 export by @wenhuach21 in #378
[HPU] Update W4A8 for HPU by @yiliu30 in #467
support for gemma3 by @n1ck-guo in #468
upload_auto-round-light results by @WeiweiZhang1 in #454
GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
fix incorrect recipe data by @WeiweiZhang1 in #471
support for mistral3 by @n1ck-guo in #472
support to export gemma3 gguf format by @n1ck-guo in #470
Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
rm redundant line break by @WeiweiZhang1 in #475
Temporarily close qxk api for new release by @n1ck-guo in #478
add restrict for exporting act-quant models by @n1ck-guo in #480

Full Changelog: v0.4.6...v0.4.7