v0.5.0
Highlights
- refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
- support xpu in tuning and inference by @wenhuach21 in #481
- support for more vlms by @n1ck-guo in #390
- change quantization method name and made several refinements by @wenhuach21 in #500
- support rtn via iters==0 by @wenhuach21 in #510
- fix bug of mix calib dataset by @n1ck-guo in #492
What's Changed
- support xpu in tuning and inference by @wenhuach21 in #481
- add light ut, fixtypos by @WeiweiZhang1 in #483
- bump into v0.4.7 by @XuehaoSun in #487
- fix dataset combine bug by @wenhuach21 in #489
- fix llama 8b time cost by @WeiweiZhang1 in #490
- update 2bits acc results by @WeiweiZhang1 in #491
- fix bug of mix calib dataset by @n1ck-guo in #492
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #494
- [GGUF support step3]patch for double quant by @n1ck-guo in #473
- refine inference backend/code step 1 by @wenhuach21 in #486
- refine inference step 2 by @wenhuach21 in #498
- change quantization method name and made several refinements by @wenhuach21 in #500
- fix bug of awq/gptq modules_to_not_convert by @n1ck-guo in #501
- use --tasks to control evaluation enabling by @wenhuach21 in #505
- fix gguf eval regression bug by @n1ck-guo in #506
- change to new api in readme by @wenhuach21 in #507
- fix setup issue on cuda machine by @wenhuach21 in #511
- support rtn via iters==0 by @wenhuach21 in #510
- fix critical bug of get_multimodal_block_names by @n1ck-guo in #509
- Update requirements-lib.txt by @yiliu30 in #513
- add group_size divisible check in backend by @wenhuach21 in #512
- support for more vlms by @n1ck-guo in #390
- move gguf-dq test to cuda by @n1ck-guo in #520
- fix bs!=1 for gemma and MiniMax-Text-01 by @wenhuach21 in #515
- add regex support in layer_config setting by @wenhuach21 in #519
- patch for vlm by @n1ck-guo in #518
- rename backend to packing_format in config.json by @wenhuach21 in #521
- fix example's model_dtype by @WeiweiZhang1 in #523
- rm fp16 export in autoround format by @wenhuach21 in #525
- update convert_hf_to_gguf to support more models by @n1ck-guo in #524
- fix light config by @WeiweiZhang1 in #526
- fix typos, add model card link for VLMs by @WeiweiZhang1 in #527
- add backend readme by @wenhuach21 in #528
- update mllm readme by @WeiweiZhang1 in #530
- fix bug of cuda ut by @n1ck-guo in #532
- fix inference issue by @wenhuach21 in #529
- update readme by @wenhuach21 in #531
- refine readme by @WeiweiZhang1 in #536
- fix cuda ut by @n1ck-guo in #537
Full Changelog: v0.4.7...v0.5.0