Skip to content

a0917bc/Research

Repository files navigation

Differentiable Lookup-Based Matrix Multiplication for Compressing Transformer Network

Overview

image

In recent years, there has been research on replacing multiplication operations. The below graph has been gathered to compare the energy cost of multiplication and addition. energy_cost_comparison_graph

As a result, approaches like AdderNet replace multiplication in convolutions with addition, while ShiftCNN represents weights as powers of two, allowing multiplication to be replaced with bit-shift operations.

Furthermore, in recent research, some have replaced the Multiply-Accumulate (MAC) operations in matrix multiplication with table lookup and addition (source).

This research is a little complicated. If you find it interesting, further details are available here.

Usage

This is a research-oriented project, without a complete usage guide yet, but I can explain what these files intend to do.

  • demo.py: Compiles the model using TVM to find the optimal parameters (block size) for hardware and runs inference.
  • prototype_learning.py: Initializes prototypes using KMCUDA.
  • tensorrt_op.py: Attempts to compile the model using torch_tensorrt and runs it on the GPU after compilation.
  • train.py and other files containing "train": Retrains the model after replacing the lookup-based matrix multiplication.
  • OpCounter.ipynb: Measures the GFLOPs and model size after replacing the lookup-based matrix multiplication using thop.

Short Flow: prototype_learning.py -> train.py -> demo.py

Contributions

  • Developed a comprehensive training pipeline, particularly effective in handling ImageNet.
  • Achieved a significant accuracy improvement of up to 10% by surpassing LUT-NN at MobiCom 2023.

Contact

Feel free to contact me(a0917bc(at)gmail(dot)com) if you have any questions.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors