Skip to content

Releases: bitsandbytes-foundation/bitsandbytes

Latest `main` wheel

12 Dec 18:07
e6ccde2

Choose a tag to compare

Latest `main` wheel Pre-release
Pre-release

Latest main pre-release wheel

This pre-release contains the latest development wheels for all supported platforms, rebuilt automatically on every commit to the main branch.

How to install:
Pick the correct command for your platform and run it in your terminal:

macOS 14+ (arm64)

pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-macosx_14_0_arm64.whl

Linux (aarch64)

pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl

Linux (x86_64)

pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl

Windows (x86_64)

pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl

Note:
These wheels are updated automatically with every commit to main and become available as soon as the python-package.yml workflow finishes.

The version number is replaced with 1.33.7-preview in order to keep the link stable, this however does not affect the installed version at all:

> pip install https://.../bitsandbytes-1.33.7-preview-py3-none-manylinux_2_24_x86_64.whl
Collecting bitsandbytes==1.33.7rc0
...
Successfully installed bitsandbytes-0.49.0.dev0

0.49.0

11 Dec 20:51

Choose a tag to compare

Highlights

x86-64 CPU Improvements

CPU performance for 4bit is significantly improved on x86-64, with optimized kernel paths for CPUs that have AVX512 or AVX512BF16 support.

AMD ROCm Experimental Wheels

  • Experimental support for AMD devices is now included in our PyPI wheels on Linux x86-64.
  • We've added additional GPU target devices as outlined in our docs.
  • Support for using the default blocksize of 64 for 4bit was added for RDNA GPUs in #1748.

macOS 14+ Wheels

  • We're now publishing wheels for macOS 14+!
  • The 4bit and 8bit quantization features are supported on MPS by slow implementations. We plan to enable Metal kernels with improved performance in the future.

🚨 Breaking Changes

  • Dropped support for Python 3.9.
  • Dropped compilation support for Maxwell GPUs in the CUDA backend.

What's Changed

New Contributors

Full Changelog: 0.48.2...0.49.0

0.48.2

29 Oct 21:48

Choose a tag to compare

What's Changed

Full Changelog: 0.48.1...0.48.2

0.48.1

02 Oct 17:47

Choose a tag to compare

This release fixes a regression introduced in 0.48.0 related to LLM.int8(). This issue caused poor inference results with pre-quantized checkpoints in HF transformers.

What's Changed

Full Changelog: 0.48.0...0.48.1

0.48.0: Intel GPU & Gaudi support, CUDA 13, performance improvements, and more!

30 Sep 21:48

Choose a tag to compare

Highlights

🎉 Intel GPU Support

We now officially support Intel GPUs on Linux and Windows! Support is included for all major features (LLM.int8(), QLoRA, 8bit optimizers) with the exception of the paged optimizer feature.

This support includes the following hardware:

  • Intel® Arc™ B-Series Graphics
  • Intel® Arc™ A-Series Graphics
  • Intel® Data Center GPU Max Series

A compatible PyTorch version with Intel XPU support is required. The current minimum is PyTorch 2.6.0. It is recommended to use the latest stable release. See Getting Started on Intel GPU for guidance.

🎉 Intel Gaudi Support

We now officially support Intel Gaudi2 and Gaudi3 accelerators. This support includes LLM.int8() and QLoRA with the NF4 data type. At this time optimizers are not implemented.

A compatible PyTorch version with Intel Gaudi support is required. The current minimum is Gaudi v1.21 with PyTorch 2.6.0. It is recommended to use the latest stable release. See the Gaudi software installation guide for guidance.

NVIDIA CUDA

  • The 4bit dequantization kernel was improved by @Mhmd-Hisham in #1746. This change brings noticeable speed improvements for prefill, batch token generation, and training. The improvement is particularly prominent on A100, H100, and B200.
  • We've added CUDA 13.0 compatibility across Linux x86-64, Linux aarch64, and Windows x86-64 platforms.
    • Hardware support for CUDA 13.0 is limited to Turing generation and newer.
    • Support for Thor (SM110) is available in the Linux aarch64 build.

🚨 Breaking Changes

  • Dropped support for PyTorch 2.2. The new minimum requirement is 2.3.0.
  • Removed Maxwell GPU support for all CUDA builds.

What's Changed

New Contributors

Full Changelog: 0.47.0...0.48.0

0.47.0

11 Aug 18:59

Choose a tag to compare

Highlights:

  • FSDP2 compatibility for Params4bit (#1719)
  • Bugfix for 4bit quantization with large block sizes (#1721)
  • Further removal of previously deprecated code (#1669)
  • Improved CPU coverage (#1628)
  • Include NVIDIA Volta support in CUDA 12.8 and 12.9 builds (#1715)

What's Changed

New Contributors

Full Changelog: 0.46.0...0.47.0

0.46.1

02 Jul 19:45

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.46.0...0.46.1

0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels

27 May 21:27

Choose a tag to compare

Highlights

  • Support for torch.compile without graph breaks for LLM.int8().
    • Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
    • Experimental CPU support is included.
  • Support torch.compile without graph breaks for 4bit.
    • Compatible with PyTorch 2.4+ for fullgraph=False.
    • Requires PyTorch 2.8 nightly for fullgraph=True.
  • We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
    • Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
  • PyTorch Custom Operators refactoring and integration:
    • We have refactored most of the library code to integrate better with PyTorch via the torch.library and custom ops APIs. This helps enable our torch.compile and additional hardware compatibility efforts.
    • End-users do not need to change the way they are using bitsandbytes.
  • Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
    • A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).

Compatability Changes

  • Support for Python 3.8 is dropped.
  • Support for PyTorch < 2.2.0 is dropped.
  • CUDA 12.6 and 12.8 builds are now compatible for manylinux_2_24 (previously manylinux_2_34).
  • Many APIs that were previously marked as deprecated have now been removed.
  • New deprecations:
    • bnb.autograd.get_inverse_transform_indices()
    • bnb.autograd.undo_layout()
    • bnb.functional.create_quantile_map()
    • bnb.functional.estimate_quantiles()
    • bnb.functional.get_colrow_absmax()
    • bnb.functional.get_row_absmax()
    • bnb.functional.histogram_scatter_add_2d()

What's Changed

New Contributors

Full Changelog: 0.45.4...0.46.0

Multi-Backend Preview

19 May 13:24
5e267f5

Choose a tag to compare

Multi-Backend Preview Pre-release
Pre-release
continuous-release_multi-backend-refactor

update compute_type_is_set attr (#1623)

0.45.5

07 Apr 13:37

Choose a tag to compare

This is a minor release that affects CPU-only usage of bitsandbytes. The CPU build of the library was inadvertently omitted from the v0.45.4 wheels.

Full Changelog: 0.45.4...0.45.5