Skip to content

Commit 03fcabd

Browse files
doc update
1 parent 3d595f1 commit 03fcabd

File tree

4 files changed

+54
-3
lines changed

4 files changed

+54
-3
lines changed

docs/source/_toctree.yml

+2
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
title: Papers, resources & how to cite
3333
- title: API reference
3434
sections:
35+
- title: Functional
36+
local: reference/functional
3537
- title: Optimizers
3638
sections:
3739
- local: reference/optim/optim_overview

docs/source/explanations/resources.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Authors: Tim Dettmers, Luke Zettlemoyer
4949
}
5050
```
5151

52-
## [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Nov 2022)](https://arxiv.org/abs/2208.07339)
52+
## [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Nov 2022)](https://arxiv.org/abs/2208.07339) [[llm-int8]]
5353
Authors: Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
5454

5555
- [LLM.int8() Blog Post](https://huggingface.co/blog/hf-bitsandbytes-integration)

docs/source/reference/functional.mdx

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Overview
2+
The `bitsandbytes.functional` API provides the low-level building blocks for the library's features.
3+
4+
## When to Use `bitsandbytes.functional`
5+
6+
* When you need direct control over quantized operations and their parameters.
7+
* To build custom layers or operations leveraging low-bit arithmetic.
8+
* To integrate with other ecosystem tooling.
9+
* For experimental or research purposes requiring non-standard quantization or performance optimizations.
10+
11+
## LLM.int8()
12+
[[autodoc]] functional.int8_double_quant
13+
14+
[[autodoc]] functional.int8_linear_matmul
15+
16+
[[autodoc]] functional.int8_mm_dequant
17+
18+
[[autodoc]] functional.int8_vectorwise_deqant
19+
20+
[[autodoc]] functional.int8_vectorwise_quant
21+
22+
23+
## 4-bit
24+
[[autodoc]] functional.dequantize_4bit
25+
26+
[[autodoc]] functional.dequantize_fp4
27+
28+
[[autodoc]] functional.dequantize_nf4
29+
30+
[[autodoc]] functional.gemv_4bit
31+
32+
[[autodoc]] functional.quantize_4bit
33+
34+
[[autodoc]] functional.quantize_fp4
35+
36+
[[autodoc]] functional.quantize_nf4
37+
38+
[[autodoc]] functional.QuantState
39+
40+
## General Quantization
41+
[[autodoc]] functional.dequantize_blockwise
42+
43+
[[autodoc]] functional.quantize_blockwise
44+
45+
## Utility
46+
[[autodoc]] functional.get_ptr
47+
48+
[[autodoc]] functional.is_on_gpu

docs/source/reference/nn/linear8bit.mdx

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
# 8-bit quantization
1+
# LLM.int8()
2+
[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that aims to make large language model inference more accessible without significant degradation. Unlike naive 8-bit quantization, which can result in loss of critical information and accuracy, LLM.int8() dynamically adapts to ensure sensitive components of the computation retain higher precision when needed. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.
23

3-
[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that doesn't degrade performance which makes large model inference more accessible. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit and quantized to Int8 before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.
4+
[Further Resources](../../explanations/resources#llm-int8)
45

56
## Linear8bitLt
67

0 commit comments

Comments
 (0)