The softmax function is a mathematical function commonly used in machine learning, especially in classification tasks. It transforms a vector of real-valued scores (often called logits) into a probability distribution. The resulting probabilities are positive and sum up to 1, making them suitable for representing categorical distributions.
-
Exponential Normalization: The softmax function applies the exponential function to each element of the input vector and then normalizes these values by dividing by the sum of all these exponentials. This has the effect of amplifying the differences between the elements of the input vector, making the highest values stand out more prominently.
-
Formula: For a vector,
$$\mathbf{z} = \begin{bmatrix} z_1 & z_2 & \cdots & z_n \end{bmatrix}$$ the softmax function for each element is,
$$\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^n e^{z_j}}$$ where e is the base of the natural logarithm.
-
Output as Probabilities: The output of the softmax function is a vector where each component is between 0 and 1, and the sum of all components is 1. This makes it useful for interpreting the outputs as probabilities.
The softmax function employs the exponential function
In addition, and unlike any of the other current design examples, this example uses MLIR dialects as direct input, including the vector,affine,arith and math dialects. This is shown in the source. This is intended to be generated from a higher-level description but is shown here as an example of how you can use other MLIR dialects as input.
The compilation process is different from the other design examples, and is shown in the Makefile.
- The input MLIR is first vectorized into chunks of size 16, and a C++ file is produced which has mapped the various MLIR dialects into AIE intrinsics, including vector loads and stores, vectorized arithmetic on those registers, and the
$e^x$ approximation using look up tables - This generated C++ is compiled into a first object file
- A file called
lut_based_ops.cppfrom the AIE2 runtime library is compiled into a second object file. This file contains the look up table contents to approximate the$e^x$ function. - A wrapper file is also compiled into an object file, which prevents C++ name mangling, and allows the wrapped C function to be called from the strucural Python
- These 3 object files are combined into a single .a file, which is then referenced inside the
softmax.pystructural Python.
This is a slightly more complex process than the rest of the examples, which typically only use a single object file containing the wrapped C++ function call, but is provided to show how a library-based flow can also be used.
-
softmax.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled using aiecc to produce design binaries (ie. XCLBIN and inst.bin for the NPU in Ryzen™ AI). -
softmax_placed.py: An alternative version of the design in softmax.py, that is expressed in a lower-level version of IRON. -
softmax_whole_array_placed.py: This Python script extends the design to utilize the entire AIE array, scaling up from the use of two cores insoftmax_placed.py. The number of cores of the AIE array (n_cores) is configurable via then_colandn_cores_per_colvariables.
For a quick reference of all available options, run:
make helpBuild and run with default settings:
make runBuild and run with custom runtime parameters:
make run size=524288 n_iterations=100 n_warmup=20There are three placement modes available:
Default mode - Uses softmax.py:
make runManual placement mode - Uses softmax_placed.py:
make run use_placed=1Whole array placement mode - Uses softmax_whole_array_placed.py:
make run use_whole_array=1
make run use_whole_array=1 whole_array_cols=4 whole_array_rows=4
make run use_whole_array=1 whole_array_cols=2 whole_array_rows=2| Variable | Default | Description |
|---|---|---|
size |
262144 | Input data size (number of elements) |
n_iterations |
20 | Number of benchmark iterations |
n_warmup |
10 | Number of warmup iterations |
use_placed |
0 | Enable manual placement mode |
use_whole_array |
0 | Enable whole array placement mode |
whole_array_cols |
1 | Number of columns (when use_whole_array=1) |
whole_array_rows |
4 | Number of cores per column (when use_whole_array=1) |
devicename |
npu | Target device (npu or npu2) |
Note: Configuration changes are automatically detected. No need to run
make cleanwhen changing parameters.
To run with profiling (outputs to results.csv):
make profileTo generate a trace file:
make use_placed=1 traceNote: Tracing is currently supported with the
use_placed=1mode.