In conv2d, we describe how to implement a two-dimensional convolution kernel on AIE. While relu describes the implementation of the Rectified Linear Unit (ReLU) activation function on AIE. This README provides instructions for fusing convolution with the ReLU activation function on a single AI Engine (AIE) core
.
+-- conv2d_fused_relu.py # A Python script that defines the AIE array structural design using MLIR-AIE operations.
+-- conv2d_fused_relu_placed.py # A Python script that defines the AIE array structural design using MLIR-AIE operations.
+-- Makefile # Contains instructions for building and compiling software projects.
+-- README.md # This file.
+-- run_makefile.lit # For LLVM Integrated Tester (LIT) of the design.
+-- run_makefile_placed.lit # For LLVM Integrated Tester (LIT) of the placed design.
+-- test.py # Python code testbench for the design example.
Fusing ReLU into the convolution operation can optimize the performance by reducing unnecessary data movement, leading to lower external memory bandwidth requirements and computational overhead. The ReLU activation function introduces non-linearity by setting negative values to zero and leaving positive values unchanged. For fixed-point arithmetic, we can utilize the Shift-Round-Saturate (SRS) capability of AIE to apply an appropriate transformation involving shifting out lower-order bits, rounding, and saturation using the SRS family of intrinsics. Using SRS intrinsics, we can efficiently implement ReLU activation while the data is in the accumulation registers. Such an implementation completely eliminates any need for data movement by fusing at the vector register level.
After performing the convolution operation, we use aie::set_rounding() and aie::set_saturation() to set the rounding and saturation modes for the computed results in the accumulator. Setting round mode postitive_inf rounds halfway towards positive infinity while setting saturation to aie::saturation_mode::saturate saturation rounds an uint8 range (0, 255).
::aie::set_rounding(
aie::rounding_mode::positive_inf); # Needed to rounding properly to uint8
::aie::set_saturation(
aie::saturation_mode::saturate); # Needed to saturate properly to uint8
The output data is generated in Y{C/8}X{C8} layout. Please refer to our conv2d design for details on the data layout.
-
Reduced Memory Bandwidth: Fusing ReLU into the convolution operation eliminates unnecessary memory accesses and data transfers associated with separate ReLU computations, leading to reduced memory bandwidth requirements.
-
Improved Performance: Fusing ReLU reduces the number of instructions executed per element, resulting in improved computational efficiency and overall performance of the convolution operation.
-
Enhanced Resource Utilization: Combining convolution and ReLU operations allows computational resources to be utilized more efficiently, maximizing throughput and achieving better resource utilization.
To compile the design:
makeTo compile the placed design:
env use_placed=1 makeTo run the design:
make run_py