Vector Scalar Multiplication:

This IRON design flow example, called "Vector Scalar Multiplication", demonstrates a simple AIE implementation for vectorized vector scalar multiply on a vector of integers. In this design, a single AIE core performs the vector scalar multiply operation on a vector with a default length 4096. The kernel is configured to work on 1024 element-sized subvectors, and is invoked multiple times to complete the full scaling. The example consists of two primary design files: vector_scalar_mul.py and scale.cc, and a testbench test.cpp or test.py. vector_scalar_mul_jit.py demonstrates implementing the design using the pre-built algorithms library in combination with IRON JIT.

Source Files Overview

vector_scalar_mul.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using aiecc to produce design binaries (ie. XCLBIN and inst.bin for the NPU in Ryzen™ AI).
vector_scalar_mul_placed.py: An alternative version of the design in vector_scalar_mul.py, that is expressed in a lower-level version of IRON.
scale.cc: A C++ implementation of scalar and vectorized vector scalar multiply operations for AIE cores. Found here.
vector_scalar_mul_jit.py: A JIT version that passes scale.cc to the transform algorithm. JIT compilation allows combining the host code with AIE design into one file.
test.cpp: This C++ code is a testbench for the Vector Scalar Multiplication design example. The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the testbench verifies the vector scalar multiply results against a CPU reference and optionally outputs trace data.
test.py: This Python code is a testbench for the Vector Scalar Multiplication design example. The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the testbench verifies the vector scalar multiply results against a CPU reference and optionally outputs trace data.

Design Overview

This simple example uses a single compute tile in the NPU's AIE array. The design is described as shown in the figure to the right. The overall design flow is as follows:

An object FIFO called "of_in" connects a Shim Tile to a Compute Tile, and another called "of_out" connects the Compute Tile back to the Shim Tile.
The runtime data movement is expressed to read 4096 int32_t data from host memory to the compute tile and write the 4096 data back to host memory. A single int32_t scale factor is also transferred form host memory to the Compute Tile.
The compute tile acquires this input data in "object" sized (1024) blocks from "of_in" and stores the result to another output "object" it has acquired from "of_out". Note that a scalar or vectorized kernel running on the Compute Tile's AIE core multiplies the data from the input "object" by a scale factor before storing it to the output "object".
After the compute is performed, the Compute Tile releases the "objects", allowing the DMAs (abstracted by the object FIFO) to transfer the data back to host memory and copy additional blocks into the Compute Tile, "of_out" and "of_in" respectively.

It is important to note that the Shim Tile and Compute Tile DMAs move data concurrently, and the Compute Tile's AIE Core also processes data concurrently with the data movement. This is made possible by having an ObjectFifo with depth of 2 (this is default) to denote ping-pong buffers.

Design Component Details

AIE Array Structural Design in `vector_scalar_mul_placed.py`

This design performs a memcpy operation on a vector of input data. The AIE design is described in a Python module as follows:

Constants & Configuration: The script defines input/output dimensions (N, n), buffer sizes in N_in_bytes and N_div_n blocks, the object FIFO buffer depth, and vector vs scalar kernel selection and tracing support booleans.
AIE Device Definition: @device defines the target device. The device_body function contains the AIE array design definition.
Scaling Function Declarations: scale_scalar_int32 and scale_int32 are external functions imported from scale.cc.
Tile Definitions: ShimTile handles data movement, and ComputeTile2 processes the scaling operations.
Object Fifos: of_in and of_out are defined to facilitate the vector data communication between ShimTile and ComputeTile2. Similarly, of_factor facilitates the scale factor communication from the ShimTile to the ComputeTile2.
Tracing Flow Setup (Optional): A circuit-switched flow is set up for tracing information when enabled.
Core Definition: The core_body function loops through sub-vectors of the input data, acquiring elements from of_in, processing using vector_scalar_mul_aie_scalar() or vector_scalar_mul_aie(), and outputting the result to of_out.
Data Movement Configuration: The aie.runtime_sequence operation configures data movement and synchronization on the ShimTile for input and output buffer management.
Tracing Configuration (Optional): Trace control, event groups, and buffer descriptors are set up in the aie.runtime_sequence operation when tracing is enabled.
Generate the design: The my_vector_scalar() function triggers the code generation process. The final print statement outputs the MLIR representation of the AIE array configuration.

AIE Core Kernel Code

scale.cc contains a C++ implementation of scalar and vectorized vector scalar multiplication operation designed for AIE cores. It consists of two main sections:

Scalar Scaling: The scale_scalar() function processes one data element at a time, taking advantage of AIE scalar datapath to load, multiply and store data elements.
Vectorized Scaling: The scale_vectorized() function processes multiple data elements simultaneously, taking advantage of AIE vector datapath capabilities to load, multiply and store data elements.
C-style Wrapper Functions: vector_scalar_mul_aie_scalar() and vector_scalar_mul_aie() are two C-style wrapper functions to call the templated scale_vectorized() and scale_scalar() implementations inside the AIE design implemented in vector_scalar_mul.py. The functions are provided for int32_t.

Usage

Compilation

To compile the design:

make

To compile the placed design:

env use_placed=1 make

To compile the C++ testbench:

make vector_scalar_mul.exe

C++ Testbench

To run the design:

make run

JIT Approach Usage

To run the JIT version:

python vector_scalar_mul_jit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Scalar Multiplication:

Source Files Overview

Design Overview

Design Component Details

AIE Array Structural Design in `vector_scalar_mul_placed.py`

AIE Core Kernel Code

Usage

Compilation

C++ Testbench

JIT Approach Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vector Scalar Multiplication:

Source Files Overview

Design Overview

Design Component Details

AIE Array Structural Design in vector_scalar_mul_placed.py

AIE Core Kernel Code

Usage

Compilation

C++ Testbench

JIT Approach Usage

AIE Array Structural Design in `vector_scalar_mul_placed.py`