Differentiable Lookup-Based Matrix Multiplication for Compressing Transformer Network

Overview

In recent years, there has been research on replacing multiplication operations. The below graph has been gathered to compare the energy cost of multiplication and addition.

As a result, approaches like AdderNet replace multiplication in convolutions with addition, while ShiftCNN represents weights as powers of two, allowing multiplication to be replaced with bit-shift operations.

Furthermore, in recent research, some have replaced the Multiply-Accumulate (MAC) operations in matrix multiplication with table lookup and addition (source).

This research is a little complicated. If you find it interesting, further details are available here.

Usage

This is a research-oriented project, without a complete usage guide yet, but I can explain what these files intend to do.

demo.py: Compiles the model using TVM to find the optimal parameters (block size) for hardware and runs inference.
prototype_learning.py: Initializes prototypes using KMCUDA.
tensorrt_op.py: Attempts to compile the model using torch_tensorrt and runs it on the GPU after compilation.
train.py and other files containing "train": Retrains the model after replacing the lookup-based matrix multiplication.
OpCounter.ipynb: Measures the GFLOPs and model size after replacing the lookup-based matrix multiplication using thop.

Short Flow: prototype_learning.py -> train.py -> demo.py

Contributions

Developed a comprehensive training pipeline, particularly effective in handling ImageNet.
Achieved a significant accuracy improvement of up to 10% by surpassing LUT-NN at MobiCom 2023.

Contact

Feel free to contact me(a0917bc(at)gmail(dot)com) if you have any questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiable Lookup-Based Matrix Multiplication for Compressing Transformer Network

Overview

Usage

Contributions

Contact

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Differentiable Lookup-Based Matrix Multiplication for Compressing Transformer Network

Overview

Usage

Contributions

Contact

References