Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 2.46 KB

File metadata and controls

42 lines (26 loc) · 2.46 KB

Differentiable Lookup-Based Matrix Multiplication for Compressing Transformer Network

Overview

image

In recent years, there has been research on replacing multiplication operations. The below graph has been gathered to compare the energy cost of multiplication and addition. energy_cost_comparison_graph

As a result, approaches like AdderNet replace multiplication in convolutions with addition, while ShiftCNN represents weights as powers of two, allowing multiplication to be replaced with bit-shift operations.

Furthermore, in recent research, some have replaced the Multiply-Accumulate (MAC) operations in matrix multiplication with table lookup and addition (source).

This research is a little complicated. If you find it interesting, further details are available here.

Usage

This is a research-oriented project, without a complete usage guide yet, but I can explain what these files intend to do.

  • demo.py: Compiles the model using TVM to find the optimal parameters (block size) for hardware and runs inference.
  • prototype_learning.py: Initializes prototypes using KMCUDA.
  • tensorrt_op.py: Attempts to compile the model using torch_tensorrt and runs it on the GPU after compilation.
  • train.py and other files containing "train": Retrains the model after replacing the lookup-based matrix multiplication.
  • OpCounter.ipynb: Measures the GFLOPs and model size after replacing the lookup-based matrix multiplication using thop.

Short Flow: prototype_learning.py -> train.py -> demo.py

Contributions

  • Developed a comprehensive training pipeline, particularly effective in handling ImageNet.
  • Achieved a significant accuracy improvement of up to 10% by surpassing LUT-NN at MobiCom 2023.

Contact

Feel free to contact me(a0917bc(at)gmail(dot)com) if you have any questions.

References