Release 0.49.0 · bitsandbytes-foundation/bitsandbytes

Highlights

x86-64 CPU Improvements

CPU performance for 4bit is significantly improved on x86-64, with optimized kernel paths for CPUs that have AVX512 or AVX512BF16 support.

AMD ROCm Experimental Wheels

Experimental support for AMD devices is now included in our PyPI wheels on Linux x86-64.
We've added additional GPU target devices as outlined in our docs.
Support for using the default blocksize of 64 for 4bit was added for RDNA GPUs in #1748.

macOS 14+ Wheels

We're now publishing wheels for macOS 14+!
The 4bit and 8bit quantization features are supported on MPS by slow implementations. We plan to enable Metal kernels with improved performance in the future.

🚨 Breaking Changes

Dropped support for Python 3.9.
Dropped compilation support for Maxwell GPUs in the CUDA backend.

What's Changed

[ROCm] Update build targets by @matthewdouglas in #1788
Drop Python 3.9 support by @matthewdouglas in #1795
Fix indexing overflow issue for blockwise quantization on AMD by @sstamenk in #1796
Tests: Run CPU tests against PyTorch 2.9 by @matthewdouglas in #1797
Remove deprecated code by @matthewdouglas in #1798
Cpu C++ kernel by @jiqing-feng in #1789
fix build error: "no case matching constant switch condition" by @yuguo68 in #1802
CI: skip rebuilding CPU lib when building/installing wheels by @matthewdouglas in #1803
add support for 64 block size on 32 warp size supported amd gpus by @electron271 in #1748
Enable more tests on AMD for warp size 32 by @sstamenk in #1805
CUDA: Drop compilation compatibility with Maxwell by @matthewdouglas in #1806
ROCm: Add build for ROCm 7.1 by @matthewdouglas in #1807
CI: Enable tests on Linux x86-64 with CUDA 13 by @matthewdouglas in #1808
Replace NULL with nullptr in pythonInterface.cpp by @yuguo68 in #1809
CI: Run tests on PRs, refactor nightly test workflow by @matthewdouglas in #1811
Remove old nightly workflow by @matthewdouglas in #1812
Cpu fused kernel by @jiqing-feng in #1804
Update README by @matthewdouglas in #1816
Cleanup: remove FastBinarySearch by @matthewdouglas in #1817
Enable publishing of macOS wheel by @matthewdouglas in #1818
ROCm: reduce size of builds by @matthewdouglas in #1819
CUDA 13: aggressive compression of binary size by @matthewdouglas in #1820
ROCm: Add gfx1150/gfx1151 to build targets by @matthewdouglas in #1822
Update workflow dependencies by @matthewdouglas in #1824
Hf kernel by @jiqing-feng in #1814
CUDA/ROCm: Remove dead code by @matthewdouglas in #1827
CPU: workaround avx512 4bit dequantize accuracy issue for large blocksize by @matthewdouglas in #1828
Update installation doc by @matthewdouglas in #1830
Add release for DGX Spark cuda121 by @mfuntowicz in #1829
Fix: Python 3.14 compatibility with PyTorch 2.9 by @matthewdouglas in #1831

New Contributors

@sstamenk made their first contribution in #1796
@yuguo68 made their first contribution in #1802
@electron271 made their first contribution in #1748
@mfuntowicz made their first contribution in #1829

Full Changelog: 0.48.2...0.49.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

0.49.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

x86-64 CPU Improvements

AMD ROCm Experimental Wheels

macOS 14+ Wheels

🚨 Breaking Changes

What's Changed

New Contributors

Contributors

Uh oh!