Skip to content

v0.4.7

Compare
Choose a tag to compare
@wenhuach21 wenhuach21 released this 01 Apr 09:50
· 58 commits to main since this release

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

Full Changelog: v0.4.6...v0.4.7