Q8 Kernels

Q8Kernels is a efficent implementation of 8bit kernels(FP8 and INT8).

Features:

-8bit GEMM(with fused gelu and bias) / 2x faster than cuBLAS FP8 and 3.5x faster than torch.mm
-FP8 Flash Attention 2 with Fast Hadamard Transform(also supports cross attention mask) / 2x faster than flash attention 2
-Mixed Precision Fast Hadamard Transform
-RMSNorm
-Mixed Precision FMA
-RoPE Layer
-Quantizers

All operations are implemented in CUDA. Current version supports ADA Architecture(Ampere optimizations are coming soon!).

Installation

q8_kernels requires CUDA Version >= 12.4 and pytorch >=2.4. q8_kernels was tested on Windows machine. Dont see problem with building on Linux systems. Install ninja pip install ninja Make sure that ninja is installed and that it works correctly (e.g. ninja --version). Without ninja installation is very slow.

git clone https://github.com/KONAKONA666/q8_kernels
cd q8_kernels 
git submodule init
git submodule update

python setup.py install
pip install . # for utility

It takes ~10-15 minutes to compile and install all modules.

Supported models

Speed ups are computed relative to transformers with inference with 16bit and flash attention 2

Model name	Speed up
LTXVideo	up to 2.5x

Acknowledgement

Thanks to: Flash attention

@66RING

fast-hadamard-transform

cutlass

@weishengying: Check his CUTE exercises and flash attn implementations

Authors

KONAKONA666

License

MIT Free Software

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.vscode		.vscode
csrc		csrc
q8_kernels		q8_kernels
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Q8 Kernels

Features:

Installation

Supported models

Acknowledgement

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

KONAKONA666/q8_kernels

Folders and files

Latest commit

History

Repository files navigation

Q8 Kernels

Features:

Installation

Supported models

Acknowledgement

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages