Skip to content

Commit 8cd7793

Browse files
Release v0.45.1
1 parent d6781bc commit 8cd7793

File tree

3 files changed

+75
-4
lines changed

3 files changed

+75
-4
lines changed

CHANGELOG.md

+73-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,79 @@
1-
### 0.45.1
1+
### v0.45.1
22

33
#### Improvements:
44

5-
- Initial Support Blackwell B100 GPUs, RTX 50 Blackwell series GPUs and Jetson Thor Blackwell
5+
* Compatibility for `triton>=3.2.0`
6+
* Moved package configuration to `pyproject.toml`
7+
* Build system: initial support for NVIDIA Blackwell B100 GPUs, RTX 50 Blackwell series GPUs and Jetson Thor Blackwell.
8+
* Note: Binaries built for these platforms are not included in this release. They will be included in future releases upon the availability of the upcoming CUDA Toolkit 12.7 and 12.8.
9+
10+
#### Bug Fixes:
11+
* Packaging: wheels will no longer include unit tests. (#1478)
12+
13+
#### Dependencies:
14+
* Sets the minimum PyTorch version to 2.0.0.
15+
16+
### 0.45.0
17+
18+
This is a significant release, bringing support for LLM.int8() to NVIDIA Hopper GPUs such as the H100.
19+
20+
As part of the compatibility enhancements, we've rebuilt much of the LLM.int8() code in order to simplify for future compatibility and maintenance. We no longer use the col32 or architecture-specific tensor layout formats while maintaining backwards compatibility. We additionally bring performance improvements targeted for inference scenarios.
21+
22+
#### Performance Improvements
23+
This release includes broad performance improvements for a wide variety of inference scenarios. See [this X thread](https://x.com/Tim_Dettmers/status/1864706051171287069) for a detailed explanation.
24+
25+
#### Breaking Changes
26+
🤗[PEFT](https://github.com/huggingface/peft) users wishing to merge adapters with 8-bit weights will need to upgrade to `peft>=0.14.0`.
27+
28+
#### Packaging Improvements
29+
* The size of our wheel has been reduced by ~43.5% from 122.4 MB to 69.1 MB! This results in an on-disk size decrease from ~396MB to ~224MB.
30+
* Binaries built with CUDA Toolkit 12.6.2 are now included in the PyPI distribution.
31+
* The CUDA 12.5.0 build has been updated to CUDA Toolkit 12.5.1.
32+
33+
34+
#### Deprecations
35+
* A number of public API functions have been marked for deprecation and will emit `FutureWarning` when used. These functions will become unavailable in future releases. This should have minimal impact on most end-users.
36+
* The k-bit quantization features are deprecated in favor of blockwise quantization. For all optimizers, using `block_wise=False` is not recommended and support will be removed in a future release.
37+
* As part of the refactoring process, we've implemented many new 8bit operations. These operations no longer use specialized data layouts.
38+
39+
#### Full Changelog
40+
41+
* refine docs for multi-backend alpha release by @Titus-von-Koeller in #1380
42+
* README: Replace special Unicode text symbols with regular characters by @akx in #1385
43+
* Update CI tools & fix typos by @akx in #1386
44+
* Fix invalid escape sequence warning in Python 3.12 by @oshiteku in #1420
45+
* [Build] Add CUDA 12.6.2 build; update 12.5.0 to 12.5.1 by @matthewdouglas in #1431
46+
* LLM.int8() Refactoring: Part 1 by @matthewdouglas in #1401
47+
48+
### 0.44.1
49+
50+
#### Bug fixes:
51+
* Fix optimizer support for Python <= 3.9 by @matthewdouglas in #1379
52+
53+
### 0.44.0
54+
55+
#### New: AdEMAMix Optimizer
56+
The [AdEMAMix](https://hf.co/papers/2409.03137) optimizer is a modification to AdamW which proposes tracking two EMAs to better leverage past gradients. This allows for faster convergence with less training data and improved resistance to forgetting.
57+
58+
We've implemented 8bit and paged variations: `AdEMAMix`, `AdEMAMix8bit`, `PagedAdEMAMix`, and `PagedAdEMAMix8bit`. These can be used with a similar API to existing optimizers.
59+
60+
#### Improvements:
61+
* **8-bit Optimizers**: The block size for all 8-bit optimizers has been reduced from 2048 to 256 in this release. This is a change from the original implementation proposed in [the paper](https://hf.co/papers/2110.02861) which improves accuracy.
62+
* **CUDA Graphs support**: A fix to enable [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/) capture of kernel functions was made in #1330. This allows for performance improvements with inference frameworks like vLLM. Thanks @jeejeelee!
63+
64+
#### Full Changelog:
65+
* Embedding4bit and Embedding8bit implementation by @galqiwi in #1292
66+
* Bugfix: Load correct nocublaslt library variant when BNB_CUDA_VERSION override is set by @matthewdouglas in #1318
67+
* Enable certain CUDA kernels to accept specified cuda stream by @jeejeelee in #1330
68+
* Initial support for ppc64le by @mgiessing in #1316
69+
* Cuda source cleanup , refactor and fixes by @abhilash1910 in #1328
70+
* Update for VS2022 17.11 compatibility with CUDA < 12.4 by @matthewdouglas in #1341
71+
* Bump the minor-patch group with 3 updates by @dependabot in #1362
72+
* Update matplotlib requirement from ~=3.9.1 to ~=3.9.2 in the major group by @dependabot in #1361
73+
* docs: add internal reference to multi-backend guide by @Titus-von-Koeller in #1352
74+
* Add move_to_device kwarg to the optimizer's load_state_dict by @koute in #1344
75+
* Add AdEMAMix optimizer by @matthewdouglas in #1360
76+
* Change 8bit optimizer blocksize 2048->256; additional bf16 support by @matthewdouglas in #1365
677

778
### 0.43.3
879

bitsandbytes/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@
2121
"optim.optimizer.MockArgs": False,
2222
}
2323

24-
__version__ = "0.45.1.dev0"
24+
__version__ = "0.45.1"

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ def has_ext_modules(self):
1212
return True
1313

1414

15-
setup(version="0.45.1.dev0", packages=find_packages(), distclass=BinaryDistribution)
15+
setup(version="0.45.1", packages=find_packages(), distclass=BinaryDistribution)

0 commit comments

Comments
 (0)