v0.4.7
Highlights
Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467
Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459
Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454
Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451
What's Changed
- step-1 support naive double quant in tuning by @wenhuach21 in #442
- fix critic bug of mxfp4 by @wenhuach21 in #451
- update readme by @wenhuach21 in #455
- update eval by @n1ck-guo in #450
- awq exporting bugfix by @WeiweiZhang1 in #456
- Support force loading into autoround Format by @WeiweiZhang1 in #453
- 20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
- fixl eval bug by @n1ck-guo in #461
- [STEP-1]W4Afp8 export by @wenhuach21 in #378
- [HPU] Update W4A8 for HPU by @yiliu30 in #467
- support for gemma3 by @n1ck-guo in #468
- upload_auto-round-light results by @WeiweiZhang1 in #454
- GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
- fix incorrect recipe data by @WeiweiZhang1 in #471
- support for mistral3 by @n1ck-guo in #472
- support to export gemma3 gguf format by @n1ck-guo in #470
- Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
- support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
- rm redundant line break by @WeiweiZhang1 in #475
- Temporarily close qxk api for new release by @n1ck-guo in #478
- add restrict for exporting act-quant models by @n1ck-guo in #480
Full Changelog: v0.4.6...v0.4.7