Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Multi backend refactor -> main (full diff of all already merged PRs) #1220

Open
wants to merge 275 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
275 commits
Select commit Hold shift + click to select a range
b2a4d54
add quant to device when init weight paam
jianan-gu Dec 4, 2023
c44cf06
minor fix
jianan-gu Dec 4, 2023
365491a
mv cuda to common backends
jianan-gu Dec 4, 2023
4050fe3
format fix
jianan-gu Dec 4, 2023
30175d1
format fix
jianan-gu Dec 4, 2023
e17549e
use device.type
jianan-gu Dec 4, 2023
a53bc31
minor fix
jianan-gu Dec 4, 2023
80c598c
backend refinement
jianan-gu Dec 4, 2023
59facc8
minor fix
jianan-gu Dec 5, 2023
066d0dc
final refinement
jianan-gu Dec 5, 2023
657ca4b
Enable col to row transformation
pnunna93 Jan 12, 2024
a390e0c
Add make functions for row to col transformation
pnunna93 Jan 12, 2024
99ad6b5
Update get_transform_buffer for row to col in HIP
pnunna93 Jan 12, 2024
039b808
Update igemmlt for col format
pnunna93 Jan 12, 2024
1a052ee
Unskip test_igemmlt_int on ROCm
pnunna93 Jan 12, 2024
b7ca5cf
Update igemmlt_int test for col inputs
pnunna93 Jan 12, 2024
a2cd90d
Skip transpose igemmlt test on ROCm
pnunna93 Jan 12, 2024
5b6c5ac
Revert "Update igemmlt_int test for col inputs"
pnunna93 Jan 12, 2024
218bf66
Return nvidia_transform from transform for HIP
pnunna93 Jan 12, 2024
8bb5c2f
Fix syntax error
pnunna93 Jan 12, 2024
eb2edf7
Add comment for shape change
pnunna93 Jan 16, 2024
a38ea0f
Enable nvidia_transform tests
pnunna93 Jan 16, 2024
fbacd7a
Merge branch 'fix_igemmlt_int' of https://github.com/pnunna93/bitsand…
pnunna93 Jan 16, 2024
67c383b
Enable igemmlt_half tests
pnunna93 Jan 16, 2024
42b860f
Revert col32 check in nvidia_transform test
pnunna93 Jan 16, 2024
7198d6b
Merge pull request #3 from pnunna93/fix_igemmlt_int
amathews-amd Jan 17, 2024
b1d484a
Merge remote-tracking branch 'upstream/main' into IFU-master-2024-01-24
pnunna93 Jan 26, 2024
c36085d
Update README.md
Lzy17 Jan 26, 2024
0e91e48
Update hip files with upstream changes
pnunna93 Jan 26, 2024
1295d53
Skip failing tests for now
pnunna93 Jan 27, 2024
48b7fa9
Merge pull request #4 from ROCm/IFU-master-2024-01-24
amathews-amd Jan 30, 2024
f1a0b8b
ops.hip: adapt to enum naming changes in ROCm/hipBLASLt@95131d6 and R…
iiisak Feb 2, 2024
e34c30e
Merge remote-tracking branch 'main/main' into upstream_device_abstrac…
jianan-gu Feb 5, 2024
cebd83c
refine backend register with base-backend
jianan-gu Feb 6, 2024
e0f2e18
Merge remote-tracking branch 'main/main' into upstream_device_abstrac…
jianan-gu Feb 6, 2024
d20c017
minor clean format
jianan-gu Feb 6, 2024
a84c369
fix wmma api parity
Lzy17 Feb 6, 2024
b044010
hipify wmma datatype
Lzy17 Feb 7, 2024
9f23308
Merge remote-tracking branch 'main/main' into upstream_device_abstrac…
jianan-gu Feb 7, 2024
b41c1c4
format in CI
jianan-gu Feb 7, 2024
1ab611e
minor fix for format
jianan-gu Feb 7, 2024
b933f9f
refactor base backend registering
jianan-gu Feb 7, 2024
8b4baaa
refine structures of backends
jianan-gu Feb 7, 2024
0905ad7
fix import issue
jianan-gu Feb 8, 2024
145a835
minor clean
jianan-gu Feb 8, 2024
7aa42be
Enable estimate quantile tests
pnunna93 Feb 12, 2024
d270832
fix CI python format
jianan-gu Feb 13, 2024
85377e1
Merge pull request #5 from iiisak/rocm_enabled
pnunna93 Feb 13, 2024
ffb0c5d
Merge pull request #7 from ROCm/fix_estimate_quantiles
amathews-amd Feb 13, 2024
68e7859
fix py38 vers incompatibility from other PR
Titus-von-Koeller Feb 15, 2024
012b565
update pre-commit
Titus-von-Koeller Feb 16, 2024
8fa27f6
cuda.py: harmonize whitespace
Titus-von-Koeller Feb 16, 2024
2c04d48
delete dead code
Titus-von-Koeller Feb 16, 2024
c184655
fix whitespace
Titus-von-Koeller Feb 16, 2024
03b53d7
fix typo
Titus-von-Koeller Feb 16, 2024
ba7a162
remove exstraneous import
Titus-von-Koeller Feb 16, 2024
d162998
factor out ensure_backend_is_available, exc instead of assert
Titus-von-Koeller Feb 17, 2024
2b77380
Merge pull request #6 from ROCm/rocwmma_merge
Lzy17 Feb 19, 2024
fad7918
Enable transpose flag for row to col transform
pnunna93 Feb 20, 2024
e3021ee
Update descriptors for transpose flag
pnunna93 Feb 20, 2024
8c3476f
revert nvidia_transform to transform
pnunna93 Feb 20, 2024
5e1b152
update changes
Feb 20, 2024
2cd9718
Remove minor device filter to avoid confusion
jianan-gu Feb 21, 2024
386e16c
Merge pull request #8 from ROCm/enable_transform_with_transpose
pnunna93 Feb 23, 2024
389bb7d
fixed minor mistakes
Feb 23, 2024
b6770bf
Merge pull request #9 from ROCm/rocm_enabled_fix_bfloat16
pnunna93 Feb 23, 2024
fa28828
remove blocksize 64 on rocm
pnunna93 Mar 6, 2024
d86d24c
remove block size 64 and enable remaining tests
pnunna93 Mar 6, 2024
cf4a506
Fix cuda build errors
pnunna93 Mar 6, 2024
7077195
remove workspace in igemmlt
pnunna93 Mar 12, 2024
ec32fc1
Enabled igemmlt in matmul
pnunna93 Mar 12, 2024
4536b25
Fix shape issue in transform function
pnunna93 Mar 12, 2024
66e34c1
Enable igemmlt int8 output
pnunna93 Mar 12, 2024
7e5e223
Add col format for extract outliers
pnunna93 Mar 12, 2024
2e42adb
Enable dequant_mm
pnunna93 Mar 12, 2024
e32d277
Enable matmullt tests
pnunna93 Mar 12, 2024
8206bd1
Enabled linear_serialization tests
pnunna93 Mar 12, 2024
973a9f8
fix error with dequant_mm change
pnunna93 Mar 12, 2024
387a9b7
Enable extract outliers test
pnunna93 Mar 12, 2024
93dfb51
Enable test overflow
pnunna93 Mar 12, 2024
90bbdc6
Skip overflow and linear serialization for now
pnunna93 Mar 12, 2024
9890d5d
Merge pull request #10 from ROCm/remove_blocksize_64
pnunna93 Mar 12, 2024
1b6dd48
Merge pull request #11 from ROCm/fix_cuda_build_errs
pnunna93 Mar 12, 2024
fc9bf4d
Merge pull request #12 from ROCm/igemm_workspace
pnunna93 Mar 12, 2024
f30dc38
Merge pull request #13 from ROCm/enable_matmul
pnunna93 Mar 12, 2024
3dc14e8
improve the gemv 4bit accuracy by forcing the hipcub to 32
Mar 18, 2024
f4ac9ac
Merge pull request #14 from ROCm/fix_gemv_4bit
Lzy17 Mar 19, 2024
485ba8f
Update skip comment
pnunna93 Mar 19, 2024
a36bd1d
Merge pull request #15 from ROCm/gemv_skip_comment
pnunna93 Mar 19, 2024
f26a4e6
Merge remote-tracking branch 'tim/multi-backend-refactor' into upstre…
jianan-gu Mar 28, 2024
adfb5e2
clean up device setup
jianan-gu Mar 28, 2024
6f08879
clean
jianan-gu Mar 28, 2024
a9e4548
fix utils
jianan-gu Mar 28, 2024
84f67d2
link QuantState in F.
jianan-gu Mar 28, 2024
9ff6c63
pre-commit run --all-files
Titus-von-Koeller Apr 3, 2024
2ffa367
Merge pull request #898 from jianan-gu/upstream_device_abstraction
Titus-von-Koeller Apr 3, 2024
a551c16
Merge remote-tracking branch 'upstream/main' into IFU-master-2024-03-28
pnunna93 Apr 4, 2024
a267221
update instructions
Apr 9, 2024
bcdcc0b
Merge pull request #19 from ROCm/updated_readme
amathews-amd Apr 9, 2024
ff33371
Update README.md
pnunna93 Apr 9, 2024
1157e73
Merge branch 'rocm_enabled' into IFU-master-2024-03-28
pnunna93 Apr 9, 2024
702ca1a
fix PEP errors
pnunna93 Apr 9, 2024
8c23dc0
Fix typos
pnunna93 Apr 9, 2024
971f4b1
Merge branch 'IFU-master-2024-03-28' of https://github.com/ROCm/bitsa…
pnunna93 Apr 9, 2024
4d6408a
Fix formatting in README file
pnunna93 Apr 10, 2024
d62516f
(backends) Stub out additional backends; move more functions to backe…
matthewdouglas Apr 11, 2024
13ad630
Add int8 ops for Intel CPU & XPU
Xia-Weiwen Apr 11, 2024
77be40b
Remove XPU code; remove cpu example; add UT
Xia-Weiwen Apr 15, 2024
8d0b695
Fix igemmlt correctness issue
Xia-Weiwen Apr 15, 2024
67d8661
Bug fix for double_quant
Xia-Weiwen Apr 18, 2024
92900f6
Remove torch.compile for double_quant
Xia-Weiwen Apr 18, 2024
79cb554
Update gpu arch setting
pnunna93 Apr 18, 2024
5c0414e
Add ROCM_PATH variable
pnunna93 Apr 18, 2024
47795f5
Add HIP_VERSION variable
pnunna93 Apr 18, 2024
6d90452
Add BNB_HIP_VERSION variable
pnunna93 Apr 18, 2024
049a2dc
Update supports igemmlt based on HIP version
pnunna93 Apr 18, 2024
47a0bc3
Skip failing tests based on HIP version
pnunna93 Apr 18, 2024
1b2a095
pre-commit fixes
pnunna93 Apr 18, 2024
4515a21
Update README file
pnunna93 Apr 18, 2024
717245d
refine pytest.skip message
Xia-Weiwen Apr 19, 2024
e7ef75f
Update default arch list
pnunna93 Apr 19, 2024
c0d244c
update readme
pnunna93 Apr 19, 2024
c037a30
Merge pull request #17 from ROCm/IFU-master-2024-03-28
lcskrishna Apr 19, 2024
73f4f05
Merge remote-tracking branch 'TD_BnB/multi-backend-refactor' into dev…
pnunna93 Apr 22, 2024
79652a5
update igemmlt for hip
pnunna93 Apr 22, 2024
aedfa8f
Update mm_dequant for hip
pnunna93 Apr 22, 2024
7835282
Update transform function for hip
pnunna93 Apr 22, 2024
93e04b5
Fix lint issues
Xia-Weiwen Apr 25, 2024
e1b60d3
Fix backward
Xia-Weiwen Apr 26, 2024
60d7560
adding arch detection for test_gemv_eye_4bit
Apr 26, 2024
cae33c3
implement get_rocm_gpu_arch
Apr 29, 2024
da53f39
fixing lint
Apr 30, 2024
ae4dcec
fixing lint
Apr 30, 2024
21d5ff6
correct lint error
Apr 30, 2024
5bada9b
Merge pull request #21 from ROCm/rocm_enabled_arch_detect
pnunna93 Apr 30, 2024
7f13c8f
merge changes from main
Titus-von-Koeller May 3, 2024
95c29a6
Fix lint issue
Xia-Weiwen May 6, 2024
749e06f
Merge pull request #1173 from matthewdouglas/backend-stubs
Titus-von-Koeller May 6, 2024
01abfde
Merge branch 'rocm_enabled' into device_abstraction
pnunna93 May 6, 2024
765bfc8
update extract_outliers, quantize_4bit, dequantize_4bit
lcskrishna May 6, 2024
d00c026
minor fixes for extract_outliers
lcskrishna May 6, 2024
e5574bd
update blocksizes for quantize and dequantize
lcskrishna May 6, 2024
b0dec0a
Update bitsandbytes/backends/cpu_xpu_common.py
Xia-Weiwen May 7, 2024
97e41b8
Merge remote-tracking branch 'upstream/multi-backend-refactor' into m…
Xia-Weiwen May 7, 2024
295bb97
Fix lint issue
Xia-Weiwen May 7, 2024
a00bd1f
Merge branch 'rocm_enabled' of https://github.com/ROCm/bitsandbytes i…
May 7, 2024
7ab3a05
update reg expression for detecting arch
lcskrishna May 7, 2024
9cd1d8c
linter updates
lcskrishna May 7, 2024
62f8ed9
Merge branch 'device_abstraction' into cl/update-device-abs
lcskrishna May 7, 2024
37b0582
Fix lint issue
Xia-Weiwen May 7, 2024
8561f09
Merge pull request #1178 from Xia-Weiwen/multi-backend-refactor-cpu-x…
Titus-von-Koeller May 7, 2024
09cc153
Support NF4 on CPU backend
Xia-Weiwen May 8, 2024
d9e4803
Merge pull request #23 from ROCm/cl/update-device-abs
pnunna93 May 8, 2024
2af8568
Merge remote-tracking branch 'upstream/multi-backend-refactor' into d…
pnunna93 May 9, 2024
06f6b25
skip linear no igemmlt test
pnunna93 May 9, 2024
2359452
Remove archive functional file
pnunna93 May 9, 2024
f76d6ab
Sync README with upstream
pnunna93 May 9, 2024
576b62c
Remove bnb_accuracy file
pnunna93 May 9, 2024
dfb531b
Remove cuda_setup
pnunna93 May 9, 2024
31b1cbc
Remove test_delete_later.c
pnunna93 May 9, 2024
ed77476
Sync with upstream
pnunna93 May 9, 2024
943c57a
Sync files with upstream
pnunna93 May 9, 2024
71d1702
Fix lint errors
pnunna93 May 10, 2024
6886bc8
Exclude hip files from typo checks
pnunna93 May 8, 2024
0d445f4
update ops.hip
pnunna93 May 10, 2024
bc6d0b7
Merge pull request #27 from ROCm/dev_abs_IFU
lcskrishna May 10, 2024
177bd39
Minor improvements
Xia-Weiwen May 10, 2024
15c7f77
Add install steps for ROCm
pnunna93 May 10, 2024
d62c835
Fix lint error
pnunna93 May 10, 2024
8aae7c9
Merge pull request #28 from ROCm/dev_abs_add_install_steps
lcskrishna May 10, 2024
881b5fc
Add fp4 support; add UT; fix lint issues
Xia-Weiwen May 11, 2024
dd15734
Reduce memory usage
Xia-Weiwen May 11, 2024
85a01b0
Fix UT
Xia-Weiwen May 11, 2024
2c489f8
reduce memory usage for nf4
Xia-Weiwen May 11, 2024
410f499
Add comments for HIP changes
pnunna93 May 15, 2024
701c5aa
Merge pull request #1206 from Xia-Weiwen/multi-backend-refactor-cpu-4bit
Titus-von-Koeller May 24, 2024
eb3b816
Merge pull request #1207 from ROCm/device_abstraction
Titus-von-Koeller May 24, 2024
ccee5d8
Add empty stubs for Ascend NPU
ji-huazhong May 27, 2024
09c314a
Merge pull request #1223 from statelesshz/backend-npu
Titus-von-Koeller May 28, 2024
2dbf876
Merge branch 'main' into multi-backend-refactor
Titus-von-Koeller May 28, 2024
36fe1a0
fix blocksize
jiqing-feng May 29, 2024
dba8376
Merge pull request #1228 from jiqing-feng/4bit
Titus-von-Koeller May 30, 2024
517eaf2
CPU: add torch.compile for F.double_quant and F.quantize_4bit (#1238)
Xia-Weiwen Jun 6, 2024
193120d
cleanup docs-build breaking install instructs (#1244)
Titus-von-Koeller Jun 21, 2024
c79b1e9
provide temp flag for outside libs to detect multi-backend preview (#…
Titus-von-Koeller Jun 21, 2024
1bfecc8
CPU/XPU: disable torch.compile if g++ is not available (#1251)
Xia-Weiwen Jul 10, 2024
0859784
Create build job for ROCm (#1255)
pnunna93 Jul 12, 2024
9b72679
Changelog: add explanation r. QLoRA mem savings
Titus-von-Koeller Jul 23, 2024
056011a
merge `main` into `multi-backend-refactor`
Titus-von-Koeller Jul 26, 2024
81375f8
docs: add more details to Intel install
Titus-von-Koeller Jul 27, 2024
24f7b65
docs: cleanup of compilation instructions
Titus-von-Koeller Jul 27, 2024
e3b2780
docs: CHANGELOG.md fix
Titus-von-Koeller Jul 27, 2024
0b53d31
Merge remote-tracking branch 'upstream/main' into multi-backend-refactor
Titus-von-Koeller Jul 27, 2024
c8b4b33
fix dtype mismatch (#1285)
jiqing-feng Jul 27, 2024
d385aea
allow features flags on bnb
Titus-von-Koeller Jul 30, 2024
452749a
Fix dequant 4bit (#1300)
jiqing-feng Aug 1, 2024
a142f1e
fix loading int8 model in CPU (#1303)
jiqing-feng Aug 2, 2024
1775035
fix transpose 4bit (#1301)
jiqing-feng Aug 2, 2024
6d9b69b
Enable bitsandbytes packaging for ROCm (#1299)
pnunna93 Aug 2, 2024
bb43857
add bnb attribute to expose supported devices
Titus-von-Koeller Aug 14, 2024
18668d2
fix 4bit dtype (#1325)
jiqing-feng Aug 20, 2024
2bfa347
docs: tweaks for multi-backend preview release prep
Titus-von-Koeller Aug 26, 2024
c8383fb
docs: get started on detailed multi-backend guide
Titus-von-Koeller Aug 29, 2024
3b94d62
rm warn for multi backend (#1336)
jiqing-feng Aug 29, 2024
39097a6
actions: update permissions for pr docs publishing
Titus-von-Koeller Aug 30, 2024
2784653
fix nf4 memory issue by init op_context in forward (#1349)
jiqing-feng Sep 13, 2024
45b7d14
AMD: Clarify diagnostic messages; free up disk space for CI build
pnunna93 Sep 16, 2024
a23984f
check grad before using ipex (#1358)
jiqing-feng Sep 19, 2024
e8881be
Enable packaging for ROCm 6.2 (#1367)
pnunna93 Sep 20, 2024
0d3d977
Update for VS2022 17.11 compatibility with CUDA < 12.4 (#1341)
matthewdouglas Sep 9, 2024
e72637c
Enable continuous releases for multi-backend-refactor branch
matthewdouglas Sep 26, 2024
662dc60
Update release workflow
matthewdouglas Sep 26, 2024
3227cdd
Publish continuous release for multi-backend
matthewdouglas Sep 26, 2024
0a2b539
continuous release: revert wheel renaming due to install err
Titus-von-Koeller Sep 27, 2024
8c5499e
Revert "continuous release: revert wheel renaming due to install err"
Titus-von-Koeller Sep 27, 2024
02d5b42
add dynamic tag-based versioning + git hash for dev vers
Titus-von-Koeller Sep 27, 2024
6927dcc
docs: update w/ changes from `main`
Titus-von-Koeller Sep 27, 2024
8dcd971
get tags for dynamic versioning
Titus-von-Koeller Sep 27, 2024
09ac7ec
fine-tune continuous release params
Titus-von-Koeller Sep 30, 2024
cc56a30
reduce the pkg size + build times for the preview release
Titus-von-Koeller Sep 30, 2024
5225ebe
refine docs for multi-backend alpha release (#1380)
Titus-von-Koeller Sep 30, 2024
e6cc109
docs: remove 2 obsolete lines
Titus-von-Koeller Oct 1, 2024
cd3cb68
Remove depth option in installation steps (#1395)
pnunna93 Oct 16, 2024
cd73601
Fix issue that no valid semantic version tag found when installing bi…
ji-huazhong Nov 20, 2024
b2ac423
Enable XPU and optimize cpu/xpu op (#1418)
jiqing-feng Nov 29, 2024
9315692
fix cpu nf4 (#1432)
jiqing-feng Dec 2, 2024
9948333
Add Ascend NPU support for nf4 quant (#1422)
ji-huazhong Dec 6, 2024
7e6f865
fix device check (#1453)
jiqing-feng Dec 17, 2024
f6025bc
Enable double quant on Intel CPU and XPU (#1472)
jiqing-feng Jan 22, 2025
307fbd5
Enable dequant+matmul 8bit path for Intel CPU and XPU (#1484)
jiqing-feng Jan 28, 2025
a0a95fd
add device index (#1489)
faaany Jan 28, 2025
ca29936
Sync branch with main; resolve conflicts.
matthewdouglas Feb 7, 2025
ed2a58d
Update base backend docstrings
matthewdouglas Feb 7, 2025
07c23de
Update NPU backend with new spec
matthewdouglas Feb 7, 2025
94d6027
Update CPU tests
matthewdouglas Feb 10, 2025
3fabd1a
ROCm: Fix compilation.
matthewdouglas Feb 10, 2025
d3ead1e
Fix
matthewdouglas Feb 10, 2025
6c4d878
Build: use setuptools_scm for dynamic versioning compatibility with p…
matthewdouglas Feb 10, 2025
2d06869
Update wheel build
matthewdouglas Feb 10, 2025
7c917b0
Add rocm6.3.2
matthewdouglas Feb 10, 2025
fdbbfb6
setuptools_scm update
matthewdouglas Feb 10, 2025
89373b8
fix xpu woq linear dtype (#1506)
jiqing-feng Feb 11, 2025
2640753
fix version (#1532)
jiqing-feng Feb 20, 2025
c66e137
enable benchmark script (#1554)
jiqing-feng Mar 4, 2025
83c147d
update comments (#1562)
jiqing-feng Mar 13, 2025
0cd87aa
enable quant storage (#1563)
jiqing-feng Mar 13, 2025
2354bdd
fix meta device dispatch (#1564)
jiqing-feng Mar 13, 2025
249a3cd
Enable XPU int matmul (#1547)
jiqing-feng Mar 13, 2025
8fe6325
Fix XPU 4bit (#1567)
jiqing-feng Mar 18, 2025
d3658c5
Fix xpu to cpu (#1570)
jiqing-feng Mar 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/scripts/build-rocm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
declare build_arch
declare build_os
declare rocm_version

set -xeuo pipefail
bnb_rocm_arch="gfx90a;gfx942;gfx1100"
if [ "${build_os:0:6}" == ubuntu ]; then
image=rocm/dev-ubuntu-22.04:${rocm_version}-complete
echo "Using image $image"
docker run --rm --platform "linux/$build_arch" -i \
-w /src -v "$PWD:/src" "$image" sh -c \
"apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends cmake \
&& cmake -DCOMPUTE_BACKEND=hip -DBNB_ROCM_ARCH=\"${bnb_rocm_arch}\" . \
&& cmake --build ."
fi

output_dir="output/${build_os}/${build_arch}"
mkdir -p "${output_dir}"
(shopt -s nullglob && cp bitsandbytes/*.{so,dylib,dll} "${output_dir}")
68 changes: 61 additions & 7 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ jobs:
# This job matrix builds the CUDA versions of the libraries for platforms that support CUDA (Linux x64/aarch64 + Windows x64)
##
build-shared-libs-cuda:
if: github.ref_name != 'multi-backend-refactor'
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -102,11 +103,55 @@ jobs:
name: shared_library_cuda_${{ matrix.os }}_${{ matrix.arch }}_${{ matrix.cuda_version }}
path: output/*
retention-days: 7

build-shared-libs-rocm:
strategy:
matrix:
os: [ubuntu-latest]
arch: [x86_64]
rocm_version:
["6.1.2", "6.2.4", "6.3.2"]
runs-on: ${{ matrix.os }} # One day, we could run them on native agents. Azure supports this now but it's planned only for Q3 2023 for hosted agents
steps:
- uses: actions/checkout@v4
- name: Set up Docker multiarch
if: startsWith(matrix.os, 'ubuntu')
uses: docker/setup-qemu-action@v2
- name: Clean up disk space
run: |
sudo rm -rf \
/usr/share/dotnet \
/opt/ghc \
"/usr/local/share/boost" \
"$AGENT_TOOLSDIRECTORY" \
/opt/hostedtoolcache \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/share/swift
- name: Build C++
run: bash .github/scripts/build-rocm.sh
env:
build_os: ${{ matrix.os }}
build_arch: ${{ matrix.arch }}
rocm_version: ${{ matrix.rocm_version }}
- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
name: shared_library_rocm_${{ matrix.os }}_${{ matrix.arch }}_${{ matrix.rocm_version }}
path: output/*
retention-days: 7
build-wheels:
needs:
- build-shared-libs
- build-shared-libs-cuda
# - build-shared-libs-cuda reduce the pkg size + build times for the preview release
- build-shared-libs-rocm
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
Expand All @@ -123,7 +168,16 @@ jobs:
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Download build artifacts
with:
fetch-depth: 0 # Needed for setuptools_scm.
# with:
# fetch-depth: 1 # shallow clone
# - name: Fetch tags for dynamic versioning in setup.py
# run: |
# git fetch --depth=1 origin --tags
# echo "Available Git tags:"
# git tag -n
- name: Download build artifact
uses: actions/download-artifact@v4
with:
merge-multiple: true
Expand All @@ -140,7 +194,7 @@ jobs:
python-version: ${{ matrix.python-version }}
cache: pip
- run: pip install build wheel
- run: python -m build .
- run: python -m build . -w
- name: Determine and Set Platform Tag, then Tag Wheel
shell: bash
run: |
Expand All @@ -157,7 +211,7 @@ jobs:
upload-pre-release-wheels:
name: Create release and upload artifacts
runs-on: ubuntu-latest
if: github.ref_name == 'main'
if: github.ref_name == 'multi-backend-refactor'
permissions:
contents: write
needs:
Expand Down Expand Up @@ -188,8 +242,8 @@ jobs:
with:
files: wheels/*.whl
prerelease: true
name: Latest `main` wheel
tag_name: continuous-release_main
name: Multi-Backend Preview
tag_name: continuous-release_multi-backend-refactor
make_latest: false
draft: false
target_commitish: ${{ github.sha }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ dmypy.json
# vim
*.swp

# BNB-specific stuff
dependencies
cuda_build
output/
bitsandbytes/_version.py
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ repos:
rev: v1.26.0
hooks:
- id: typos
exclude: ^.*\.hip$
127 changes: 123 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# For GCC: `cmake -B build . && cmake --build build`
# For MSVC: `cmake -B build . && cmake --build build --config Release`
# You can also use the following options and variables
# - COMPUTE_BACKEND: Set to `cpu`, `cuda`, or `mps` to select the backend
# - COMPUTE_BACKEND: Set to `cpu`, `cuda`, `hip`, `mps` or `npu` to select the backend
# - CUDA_VERSION: The expected CUDA version, for sanity checking. The actual version
# is whatever CMake finds on your path.
# - COMPUTE_CAPABILITY: Which GPU Arch/Compute codes to provide to NVCC.
Expand All @@ -25,13 +25,15 @@ endif()
# Define included source files
set(CPP_FILES csrc/common.cpp csrc/cpu_ops.cpp csrc/pythonInterface.cpp)
set(CUDA_FILES csrc/ops.cu csrc/kernels.cu)
set(HIP_FILES csrc/ops.hip csrc/kernels.hip)
set(MPS_FILES csrc/mps_ops.mm)
set(METAL_FILES csrc/mps_kernels.metal)
set(NPU_FILES csrc/npu_ops.cpp)
# C++ sources are always included
list(APPEND SRC_FILES ${CPP_FILES})

set(COMPUTE_BACKEND "cpu" CACHE STRING "The compute backend to use (cpu, cuda, mps)")
set_property(CACHE COMPUTE_BACKEND PROPERTY STRINGS cpu cuda mps)
set(COMPUTE_BACKEND "cpu" CACHE STRING "The compute backend to use (cpu, cuda, hip, mps, npu)")
set_property(CACHE COMPUTE_BACKEND PROPERTY STRINGS cpu cuda hip mps npu)
option(PTXAS_VERBOSE "Pass through -v flag to PTX Assembler" OFF)

if(APPLE)
Expand All @@ -47,15 +49,32 @@ if(${COMPUTE_BACKEND} STREQUAL "cuda")
message(FATAL_ERROR "CUDA is not supported on macOS" )
endif()
set(BUILD_CUDA ON)
set(BUILD_HIP OFF)
set(BUILD_MPS OFF)
message(STATUS "NO_CUBLASLT := ${NO_CUBLASLT}")
elseif(${COMPUTE_BACKEND} STREQUAL "hip")
if(APPLE)
message(FATAL_ERROR "HIP is not supported on macOS" )
endif()
option(NO_CUBLASLT "Disable HIPBLASLT" OFF)
set(BUILD_CUDA OFF)
set(BUILD_HIP ON)
set(BUILD_MPS OFF)
elseif(${COMPUTE_BACKEND} STREQUAL "mps")
if(NOT APPLE)
message(FATAL_ERROR "MPS is only supported on macOS" )
endif()
set(BUILD_CUDA OFF)
set(BUILD_HIP OFF)
set(BUILD_MPS ON)
elseif(${COMPUTE_BACKEND} STREQUAL "npu")
set(BUILD_CUDA OFF)
set(BUILD_HIP OFF)
set(BUILD_MPS OFF)
set(BUILD_NPU ON)
else()
set(BUILD_CUDA OFF)
set(BUILD_HIP OFF)
set(BUILD_MPS OFF)
endif()

Expand Down Expand Up @@ -175,6 +194,36 @@ if(BUILD_CUDA)

string(APPEND BNB_OUTPUT_NAME "_cuda${CUDA_VERSION_SHORT}")
add_compile_definitions(BUILD_CUDA)
elseif(BUILD_HIP)
enable_language(HIP)
message(STATUS "HIP Compiler: ${CMAKE_HIP_COMPILER}")
if(DEFINED BNB_ROCM_ARCH)
set(CMAKE_HIP_ARCHITECTURES ${BNB_ROCM_ARCH})
else()
if (NOT AMDGPU_TARGETS AND NOT CMAKE_HIP_ARCHITECTURES)
set(CMAKE_HIP_ARCHITECTURES "gfx90a;gfx942;gfx1100")
elseif (AMDGPU_TARGETS AND NOT CMAKE_HIP_ARCHITECTURES)
set(CMAKE_HIP_ARCHITECTURES ${AMDGPU_TARGETS})
endif()
endif()
message(STATUS "HIP Targets: ${CMAKE_HIP_ARCHITECTURES}")

list(APPEND SRC_FILES ${HIP_FILES})

string(APPEND BNB_OUTPUT_NAME "_rocm")

# get hip version
execute_process(COMMAND hipconfig --version OUTPUT_VARIABLE HIP_CONFIG_VERSION)
string(REGEX MATCH "[0-9]+\\.[0-9]+" HIP_VERSION "${HIP_CONFIG_VERSION}")
string(REPLACE "." "" HIP_VERSION_SHORT "${HIP_VERSION}")

string(APPEND BNB_OUTPUT_NAME "${HIP_VERSION_SHORT}")
if(NO_CUBLASLT OR HIP_VERSION VERSION_LESS "6.1")
string(APPEND BNB_OUTPUT_NAME "_nohipblaslt")
endif()
add_compile_definitions(__HIP_PLATFORM_AMD__)
add_compile_definitions(__HIP_PLATFORM_HCC__)
add_compile_definitions(BUILD_HIP)
elseif(BUILD_MPS)
if(NOT APPLE)
message(FATAL_ERROR "MPS is only supported on macOS" )
Expand All @@ -194,6 +243,33 @@ elseif(BUILD_MPS)
COMMENT "Compiling Metal kernels"
VERBATIM)
add_custom_target(metallib DEPENDS "bitsandbytes/bitsandbytes.metallib")
elseif(BUILD_NPU)
list(APPEND SRC_FILES ${NPU_FILES})

set(SOC_VERSION "Ascend910B4" CACHE STRING "system on chip type")
set(ASCEND_CANN_PACKAGE_PATH $ENV{ASCEND_HOME_PATH} CACHE
STRING "ASCEND CAN package installation directory"
)

# ${KERNEL_FILES} are used to compile library, push files written by ascendc in ${KERNEL_FILES}.
# ref to cmake/npu.cmake ascendc_library, cmake/cpu.cmake add_library
# file(GLOB KERNEL_FILES ${CMAKE_CURRENT_SOURCE_DIR}/csrc/npu_kernels.cpp)
file(GLOB KERNEL_FILES csrc/npu_kernels.cpp)

if(EXISTS ${ASCEND_CANN_PACKAGE_PATH}/compiler/tikcpp/ascendc_kernel_cmake)
set(ASCENDC_CMAKE_DIR ${ASCEND_CANN_PACKAGE_PATH}/compiler/tikcpp/ascendc_kernel_cmake)
elseif(EXISTS ${ASCEND_CANN_PACKAGE_PATH}/tools/tikcpp/ascendc_kernel_cmake)
set(ASCENDC_CMAKE_DIR ${ASCEND_CANN_PACKAGE_PATH}/tools/tikcpp/ascendc_kernel_cmake)
else()
message(FATAL_ERROR "ascendc_kernel_cmake does not exist ,please check whether the can package is installed")
endif()
include(${ASCENDC_CMAKE_DIR}/ascendc.cmake)

# ascendc_library use to add kernel file to generate ascendc library
ascendc_library(ascendc_kernels_npu STATIC ${KERNEL_FILES})

string(APPEND BNB_OUTPUT_NAME "_npu")
add_compile_definitions(BUILD_NPU)
else()
string(APPEND BNB_OUTPUT_NAME "_cpu")
set(GPU_SOURCES)
Expand All @@ -211,7 +287,11 @@ endif()

set_source_files_properties(${CPP_FILES} PROPERTIES LANGUAGE CXX)
add_library(bitsandbytes SHARED ${SRC_FILES})
target_compile_features(bitsandbytes PUBLIC cxx_std_14)
if(BUILD_NPU)
target_compile_features(bitsandbytes PUBLIC cxx_std_17)
else()
target_compile_features(bitsandbytes PUBLIC cxx_std_14)
endif()
target_include_directories(bitsandbytes PUBLIC csrc include)


Expand All @@ -223,10 +303,49 @@ if(BUILD_CUDA)
CUDA_SEPARABLE_COMPILATION ON
)
endif()
if(BUILD_HIP)
if(NOT DEFINED ENV{ROCM_PATH})
set(ROCM_PATH /opt/rocm)
else()
set(ROCM_PATH $ENV{ROCM_PATH})
endif()
list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH})
macro(find_package_and_print_version PACKAGE_NAME)
find_package("${PACKAGE_NAME}" ${ARGN})
message("${PACKAGE_NAME} VERSION: ${${PACKAGE_NAME}_VERSION}")
endmacro()
find_package_and_print_version(hipblas REQUIRED)
find_package_and_print_version(hiprand REQUIRED)
find_package_and_print_version(hipsparse REQUIRED)

## hacky way of excluding hip::amdhip64 (with it linked many tests unexpectedly fail e.g. adam8bit because of inaccuracies)
set_target_properties(hip::host PROPERTIES INTERFACE_LINK_LIBRARIES "")
set_target_properties(hip-lang::host PROPERTIES INTERFACE_LINK_LIBRARIES "")
set(CMAKE_HIP_IMPLICIT_LINK_LIBRARIES "")

target_include_directories(bitsandbytes PRIVATE ${CMAKE_SOURCE_DIR} ${CMAKE_SOURCE_DIR}/include ${ROCM_PATH}/include /include)
target_link_directories(bitsandbytes PRIVATE ${ROCM_PATH}/lib /lib)
target_link_libraries(bitsandbytes PUBLIC roc::hipblas hip::hiprand roc::hipsparse)

target_compile_definitions(bitsandbytes PUBLIC BNB_USE_HIP)
set_source_files_properties(${HIP_FILES} PROPERTIES LANGUAGE HIP)
set_target_properties(bitsandbytes PROPERTIES LINKER_LANGUAGE CXX)

if(NO_CUBLASLT OR HIP_VERSION VERSION_LESS "6.1")
target_compile_definitions(bitsandbytes PUBLIC NO_HIPBLASLT)
else()
find_package(hipblaslt)
target_link_libraries(bitsandbytes PUBLIC roc::hipblaslt)
endif()
endif()
if(BUILD_MPS)
add_dependencies(bitsandbytes metallib)
target_link_libraries(bitsandbytes objc "-framework Foundation" "-framework Metal" "-framework MetalPerformanceShaders" "-framework MetalPerformanceShadersGraph")
endif()
if(BUILD_NPU)
target_compile_options(bitsandbytes PRIVATE -O2 -std=c++17)
target_link_libraries(bitsandbytes PRIVATE $<BUILD_INTERFACE:host_intf_pub> ascendc_kernels_npu)
endif()

if(WIN32)
set_target_properties(bitsandbytes PROPERTIES PREFIX "lib")
Expand Down
3 changes: 3 additions & 0 deletions _typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
[default]
extend-ignore-re = [
"@Ther-nul", # valid Github user
"CANN", # CANN (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for Ascend NPU
]
extend-ignore-identifiers-re = [
".*arange.*",
Expand All @@ -11,6 +12,8 @@ extend-ignore-identifiers-re = [

[type.py.extend-words]
"BA" = "BA" # used as a commented-out variable in tests
"cann" = "cann" # cann (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for Ascend NPU


[type.cuda.extend-words]
"subtile" = "subtile"
Expand Down
Loading