Name	Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt	CMakeLists.txt
Makefile	Makefile
README.md	README.md
run_makefile.lit	run_makefile.lit
run_makefile_placed.lit	run_makefile_placed.lit
run_makefile_whole_array_placed.lit	run_makefile_whole_array_placed.lit
run_strix_makefile.lit	run_strix_makefile.lit
run_strix_makefile_placed.lit	run_strix_makefile_placed.lit
run_strix_makefile_whole_array_placed.lit	run_strix_makefile_whole_array_placed.lit
softmax.py	softmax.py
softmax_placed.py	softmax_placed.py
softmax_whole_array_placed.py	softmax_whole_array_placed.py
test.cpp	test.cpp

Softmax

The softmax function is a mathematical function commonly used in machine learning, especially in classification tasks. It transforms a vector of real-valued scores (often called logits) into a probability distribution. The resulting probabilities are positive and sum up to 1, making them suitable for representing categorical distributions.

Key Characteristics

Exponential Normalization: The softmax function applies the exponential function to each element of the input vector and then normalizes these values by dividing by the sum of all these exponentials. This has the effect of amplifying the differences between the elements of the input vector, making the highest values stand out more prominently.
Formula: For a vector,
$$\mathbf{z} = \begin{bmatrix} z_1 & z_2 & \cdots & z_n \end{bmatrix}$$
the softmax function for each element is,
$$\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^n e^{z_j}}$$
where e is the base of the natural logarithm.
Output as Probabilities: The output of the softmax function is a vector where each component is between 0 and 1, and the sum of all components is 1. This makes it useful for interpreting the outputs as probabilities.

Compilation details

The softmax function employs the exponential function $e^x$, similar to the example found here. Again to efficiently implement softmax, a lookup table approximation is utilized.

In addition, and unlike any of the other current design examples, this example uses MLIR dialects as direct input, including the vector,affine,arith and math dialects. This is shown in the source. This is intended to be generated from a higher-level description but is shown here as an example of how you can use other MLIR dialects as input.

The compilation process is different from the other design examples, and is shown in the Makefile.

The input MLIR is first vectorized into chunks of size 16, and a C++ file is produced which has mapped the various MLIR dialects into AIE intrinsics, including vector loads and stores, vectorized arithmetic on those registers, and the $e^x$ approximation using look up tables
This generated C++ is compiled into a first object file
A file called lut_based_ops.cpp from the AIE2 runtime library is compiled into a second object file. This file contains the look up table contents to approximate the $e^x$ function.
A wrapper file is also compiled into an object file, which prevents C++ name mangling, and allows the wrapped C function to be called from the strucural Python
These 3 object files are combined into a single .a file, which is then referenced inside the softmax.py structural Python.

This is a slightly more complex process than the rest of the examples, which typically only use a single object file containing the wrapped C++ function call, but is provided to show how a library-based flow can also be used.

softmax.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using aiecc to produce design binaries (ie. XCLBIN and inst.bin for the NPU in Ryzen™ AI).
softmax_placed.py: An alternative version of the design in softmax.py, that is expressed in a lower-level version of IRON.
softmax_whole_array_placed.py: This Python script extends the design to utilize the entire AIE array, scaling up from the use of two cores in softmax_placed.py. The number of cores of the AIE array (n_cores) is configurable via the n_col and n_cores_per_col variables.

Usage

For a quick reference of all available options, run:

make help

Build and Run

Build and run with default settings:

make run

Build and run with custom runtime parameters:

make run size=524288 n_iterations=100 n_warmup=20

Placement Modes

There are three placement modes available:

Default mode - Uses softmax.py:

make run

Manual placement mode - Uses softmax_placed.py:

make run use_placed=1

Whole array placement mode - Uses softmax_whole_array_placed.py:

make run use_whole_array=1
make run use_whole_array=1 whole_array_cols=4 whole_array_rows=4
make run use_whole_array=1 whole_array_cols=2 whole_array_rows=2

Configuration Variables

Variable	Default	Description
`size`	262144	Input data size (number of elements)
`n_iterations`	20	Number of benchmark iterations
`n_warmup`	10	Number of warmup iterations
`use_placed`	0	Enable manual placement mode
`use_whole_array`	0	Enable whole array placement mode
`whole_array_cols`	1	Number of columns (when `use_whole_array=1`)
`whole_array_rows`	4	Number of cores per column (when `use_whole_array=1`)
`devicename`	npu	Target device (`npu` or `npu2`)

Note: Configuration changes are automatically detected. No need to run make clean when changing parameters.

Profiling

To run with profiling (outputs to results.csv):

make profile

Hardware Tracing

To generate a trace file:

make use_placed=1 trace

Note: Tracing is currently supported with the use_placed=1 mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Softmax

Key Characteristics

Compilation details

Usage

Build and Run

Placement Modes

Configuration Variables

Profiling

Hardware Tracing

FilesExpand file tree

softmax

Directory actions

More options

Directory actions

More options

Latest commit

History

softmax

Folders and files

parent directory

README.md

Softmax

Key Characteristics

Compilation details

Usage

Build and Run

Placement Modes

Configuration Variables

Profiling

Hardware Tracing