0.49.2

Latest

Latest

matthewdouglas released this 16 Feb 21:29

· 41 commits to main since this release

f0e6ca3

Highlights

The default blocksize of 64 for 4bit quantization is now supported on ROCm. Previously the default was 128, which was a mismatch from the default for other devices.
ROCm 7.2 build is now included.

What's Changed

bug: fix 8bitoptim support with fsdp by @ved1beta in #1840
Fix xpu 4bit kernel by @jiqing-feng in #1839
ROCm 7.2 build and doc changes by @sstamenk in #1845
Add CUDA kernel support for 4-bit quantization with blocksize=32 by @Abdennacer-Badaoui in #1854
Add blocksize=64 4-bit quantization support for ROCm CDNA (warp64) GPUs by @Abdennacer-Badaoui in #1856
[Docs Update] QLoRA 4-bit Support on ROCm by @Abdennacer-Badaoui in #1857
[ROCm] Make blocksize=64 default for 4bit by @matthewdouglas in #1873
Handle non-contiguous tensors in quantize/dequantize ops by @TimDettmers in #1859
Fix AdEMAMix scheduler guard and add state_dict round-trip test by @TimDettmers in #1861

New Contributors

@Abdennacer-Badaoui made their first contribution in #1854

Full Changelog: 0.49.1...0.49.2

Contributors

TimDettmers, matthewdouglas, and 4 other contributors

Assets 2