02 Feb 14:51

0f5c394

This release changed the default bitsandbytets matrix multiplication (bnb.matmul) to now support memory efficient backward by default. Additionally, matrix multiplication with 8-bit weights is supported for all GPUs.

During backdrop, the Int8 weights are converted back to a row-major layout through an inverse index. The general matmul for all GPUs by using Int8 weights is done by casting the weights from Int8 to the inputs data type (FT32/FP32/BF16/F16) and then doing standard matrix multiplication. As such, the matrix multiplication during backdrop and for non-tensor-core devices will be memory efficient, but slow.

These contributions were the work of Alexander Borzunov and Yozh, thank you!

Features:

Int8 MatmulLt now supports backward through inversion of the ColTuring/ColAmpere format. Slow, but memory efficient. Big thanks to @borzunov
Int8 now supported on all GPUs. On devices with compute capability < 7.5, the Int weights are cast to 16/32-bit for the matrix multiplication. Contributed by @borzunov

Improvements:

Improved logging for the CUDA detection mechanism.

Contributors

borzunov

Assets 4

04 Jan 11:57

TimDettmers

0.36.0

3901ebf

Ada/Hopper+fake k-bit quantization

The 0.36.0 release brings a lot of bug fixes, improvements, and new features:

better automatic CUDA detection & setup
better automatic compilation instruction generation in the case of failures
CUDA 11.8 and 12.0 support
Ada (RTX 40s series) and Hopper (H100) support
Added fake k-bit float, int, and quantile quantization (2 <= k <= 8, Int8 storage)

Additional features also include fake k-bit quantization and smaller block sizes for block-wise quantization, which are used in our k-bit Inference Scaling Laws work. Fake k-bit quantization is useful to simulated k-bit data types, but they do not provide memory or runtime benefits. Here is how you use these features.

Faster block-wise quantization that now allows for very small block sizes of down to 64:

from bitsandbytes import functional as F
q, state = F.quantize_blockwise(X, blocksize=64)
X = F.dequantize_blockwise(q, state, blocksize=64)

k-bit fake quantization via block-wise quantization:

# 4-bit float quantization stored as Int8
from bitsandbytes import functional as F
# 4-bit float with 2 exponent bits
code = F.create_fp8_map(signed=True, exponent_bits=2, precision_bits=1, total_bits=4).cuda()
q, state = F.quantize_blockwise(X, code=code) # q has 4-bit indices which represent values in the codebook
X = F.dequantize_blockwise(q, state)

0.36.0: Improvements, Ada/Hopper support, fake k-bit quantization.

Features:

CUDA 11.8 and 12.0 support added
support for Ada and Hopper GPUs added (compute capability 8.9 and 9.0)
support for fake k-bit block-wise quantization for Int, Float, quantile quantization, and dynamic exponent data types added
Added CUDA instruction generator to fix some installations.
Added additional block sizes for quantization {64, 128, 256, 512, 1024}
Added SRAM Quantile algorithm to quickly estimate less than 256 quantiles
Added option to suppress the bitsandbytes welcome message (@Cyberes)

Regression:

Compute capability 3.0 removed: GTX 600s and 700s series is no longer supported (except GTX 780 and GTX 780 Ti)

Bug fixes:

fixed a bug where too long directory names would crash the CUDA SETUP #35 (@tomaarsen)
fixed a bug where CPU installations on Colab would run into an error #34 (@tomaarsen)
fixed an issue where the default CUDA version with fast-DreamBooth was not supported #52
fixed a bug where the CUDA setup failed due to a wrong function call.
fixed a bug in the CUDA Setup which led to an incomprehensible error if no GPU was detected.
fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
fixed a bug where not finding the cuda runtime led to an incomprehensible error.
fixed a bug where with missing CUDA the default was an error instead of the loading the CPU library
fixed a bug where the CC version of the GPU was not detected appropriately (@BlackHC)
fixed a bug in CPU quantization which lead to errors when the input buffer exceeded 2^31 elements

Improvements:

multiple improvements in formatting, removal of unused imports, and slight performance improvements (@tomaarsen)
StableEmbedding layer now has device and dtype parameters to make it 1:1 replaceable with regular Embedding layers (@lostmsu)
runtime performance of block-wise quantization slightly improved
added error message for the case multiple libcudart.so are installed and bitsandbytes picks the wrong one

Contributors

lostmsu, BlackHC, and 2 other contributors

Assets 4

10 Oct 03:16

TimDettmers

0.35.0

b844e10

CUDA 11.8 Support for Dreambooth finetuning

0.35.0

CUDA 11.8 support and bug fixes

Features:

CUDA 11.8 support added and binaries added to the PyPI release.

Bug fixes:

fixed a bug where too long directory names would crash the CUDA SETUP #35 (thank you @tomaarsen)
fixed a bug where CPU installations on Colab would run into an error #34 (thank you @tomaarsen)
fixed an issue where the default CUDA version with fast-DreamBooth was not supported #52

Contributors

tomaarsen

Assets 4

20 Sep 04:54

TimDettmers

0.34.0

7740c6e

Memory efficient backprop

This release introduces memory-efficient backprop through frozen weights where the gradient is calculated from the 8-bit weights but is computed in fp16. This is useful for creating Low-rank (LoRa) Adapters for fine-tuning large models.

This is a feature contributed by @dbaranchuk and @justheuristic.

0.34.0

Bug fixes and memory-efficient backprop

Features:

Linear8bitLt layer now supports memory_efficient_backward=True which enables backprop of gradients through frozen weights.

Bug fixes:

fixed an issue where too many threads were created in blockwise quantization on the CPU for large tensors

Contributors

justheuristic and dbaranchuk

Assets 4

11 Sep 23:15

TimDettmers

0.33.0

d8dbf3a

0.33.0: Various bug fixes

0.33.0

Various bug fixes

Features:

CPU quantization now supports a variable blocksize variable to enhance quantization speed or precision. 19a7adc

Bug fixes:

fixed an issue in CPU quantization where tensors with more than 2^31 elements would fail 19a7adc
fixed a bug where cpu binaries would fail if no GPU would be detected eab4d82
fixed an issue where cpu binaries cause additional stdout messages 92a3363
fixed an import of bnb.utils 2e630b5

We thank @mryab, @mbrukman, @chessgecko, @dbaranchuk for pull request with bug fixes and new features.

Contributors

chessgecko, mbrukman, and 2 other contributors

Assets 4

Uh oh!

Releases: bitsandbytes-foundation/bitsandbytes

Int8 Matmul backward for all GPUs

Features:

Improvements:

Contributors

Uh oh!

Ada/Hopper+fake k-bit quantization

0.36.0: Improvements, Ada/Hopper support, fake k-bit quantization.

Contributors

Uh oh!

CUDA 11.8 Support for Dreambooth finetuning

0.35.0

CUDA 11.8 support and bug fixes

Contributors

Uh oh!

Memory efficient backprop

0.34.0

Bug fixes and memory-efficient backprop

Contributors

Uh oh!

0.33.0: Various bug fixes

0.33.0

Various bug fixes

Contributors

Uh oh!