Skip to content

大三上做的本科毕设,包含BNN的替代梯度训练,verilog电路实现,完成180nm工艺流片。

Notifications You must be signed in to change notification settings

curryfromuestc/BNN_accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

硬件友好的神经网络设计与部署 (Hardware-friendly Neural Network Design and Deployment)

作者: 杨果林 (Guolin Yang)

仓库地址: https://github.com/curryfromuestc/BNN_accelerator

摘要

硬件友好的神经网络设计与部署旨在平衡模型性能与硬件效率,推动人工智能技术在资源受限的边缘设备上的普及。当前深度学习模型庞大的参数量和高计算复杂度,导致其在边缘设备部署时面临算力、存储和功耗的严峻挑战。二值化神经网络(BNN)通过极端量化显著降低了模型复杂度和计算量,但其精度损失和训练困难问题亟待解决,同时需要高效的硬件架构以发挥其优势,因此研究兼顾算法性能和硬件友好的BNN设计与对应硬件加速模块设计至关重要。本文针对BNN的挑战,在算法优化和硬件加速器设计方面开展了创新性研究工作。主要工作包括:(1)提出了一种改进的BNN训练策略,采用带参数Sigmoid函数近似梯度并结合Adam优化器,提升了训练稳定性和模型精度;(2)设计并实现了一套面向BNN的硬件加速器架构,采用流水线、滑窗等优化技术,并基于180nm工艺完成了后端物理实现与验证,最终整个计算核心只消耗了20万(\mu m^2)的资源,对于MNIST数据集的识别准确率能达到0.91;(3)将所提方法应用于LeNet、ResNet和ViT等网络,并在标准数据集上验证了在对权重进行二值化过后,通过量化感知训练等方法,在精度上仍能接近全精度模型的水平。

关键词: 神经网络,硬件加速器,深度学习模型轻量化

项目结构

.
├── dc_syn/         # DC综合脚本与文件
├── fig/            # 项目相关的图片
├── ic2/            # IC设计相关文件
├── paper/          # 相关的论文的Latex工程
├── README.md       # 本文档
├── rtl/            # Verilog RTL 源代码
├── sim/            # Verilog testbenches
├── train/          # 神经网络训练脚本 (PyTorch)
└── vivado_works/   # Vivado 工程文件

BNN 加速器硬件架构

系统架构如图所示: pic

卷积模块

卷积由滑窗部分和五级流水线组成,滑窗模块从左到右,从上到下依次读取,在读取三行过后即可同步前三行的第一列像素点,接下来的两个周期会同步第一个卷积核需要的像素点,经过五级流水线过后,会输出第一个计算完成的像素点,此后,每个周期输出一个有用的像素点,同时输出对应的有效信号。

全连接模块

全连接模块接收到有效信号过后,将输入与tb里面读取的权重做同或操作,暂存于内部寄存器当中,当接受了12124个元素过后,全连接输出有效。

最大池化模块

最大池化模块在计算第二层时启动,在行数为偶,且列数为偶时输出有效,输出值作为全连接模块的输入。

控制模块

controller接收卷积的输出及其有效信号,同时控制fmap中的数据读写,控制maxpool的启动,在全连接输出有效时,对比输出值的大小,输出对应的独热码。

独热码转二进制模块

考虑到端口的数量不够,进行独热码转二进制的操作,当controller输出的独热码有效时,进行二进制编码。

论文 (Thesis Abstract)

Hardware-friendly neural network design and deployment aim to balance model performance and hardware efficiency, promoting the proliferation of artificial intelligence (AI) technologies on resource-constrained edge devices. Current deep learning models, characterized by their vast number of parameters and high computational complexity, face significant challenges in terms of compute power, storage, and power consumption when deployed on edge devices. Binarized Neural Networks (BNNs), through extreme quantization, significantly reduce model complexity and computational requirements. However, issues such as accuracy loss and training difficulties need to be addressed, and efficient hardware architectures are required to leverage their advantages. Therefore, research into BNN designs that consider both algorithmic performance and hardware-friendliness, along with the design of corresponding hardware accelerator modules, is crucial. This thesis addresses the challenges of BNNs through innovative research in algorithm optimization and hardware accelerator design. The main contributions include: (1) Proposing an improved BNN training strategy that employs a parameterized Sigmoid function to approximate gradients, combined with the Adam optimizer, enhancing training stability and model accuracy. (2) Designing and implementing a hardware accelerator architecture specifically for BNNs, utilizing optimization techniques such as pipelining and sliding windows, and completing the back-end physical implementation and verification based on 180nm technology, with the entire computing core consuming only 0.2 mm². (3) Applying the proposed methods to networks including LeNet, ResNet, and ViT, and validating their effectiveness on standard datasets, demonstrating that after binarizing weights and employing techniques such as quantization-aware training, the accuracy can still reach levels comparable to full-precision models.

Keywords: Neural Networks, Hardware Accelerator, Lightweight Deep Learning Models, AI Computational Efficiency

About

大三上做的本科毕设,包含BNN的替代梯度训练,verilog电路实现,完成180nm工艺流片。

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published