Laconic Deep Learning Inference Acceleration

Corresponding author

Sayeh Sharify, Alberto Delmas Lascorz, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Dylan Malone Stuart, Zissis Poulos, Andreas Moshovos, University of Toronto

Keywords

Laconic Processing Element (LPE), inference, deep learning, bit sparsity, activation and weight, architecture

Summary

Challenge

The bulk of the energy and work in neural networks during inference is due to the data transfers and the computation needed to perform multiply-accumulate operations (MACs) of weight (W) and activation (A) values into partial sums (psum).

Contribution

Find that neural network exhibit high bit sparsity in their activations and weights.
Propose Laconic, a hardware accelerator design that exploits bit sparsity in activations and weights. Representing data with Booth Encoding.

Innovation Points

Propose a data-parallel histogram-based Laconic processing element (LPE) that is much smaller (5x in 8b) and energy efficient than a bit-parallel PE.

Result

By decomposing multiplications down to the bit level, the amount of work needed by multiplications during inference can be potentially reduced by at least 40× across a wide selection of neural networks (8b and 16b).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!