IRON is an open-source & close-to-metal Python API enabling fast and efficient execution on AMD Ryzenβ’ AI NPUs. It relies on language bindings around the MLIR-AIE dialect.
The IRON Python API for Ryzenβ’ AI NPUs is described in the following paper:
E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.
| Section | Description | Datatype | AIE2 | AIE2P | Status | Design Example |
|---|---|---|---|---|---|---|
| Element-wise Add | Element-wise addition kernel | bfloat16 | β | β | π’ | example/elementwise_add/ |
| Element-wise Mul | Element-wise multiplication kernel | bfloat16 | β | β | π’ | example/elementwise_mul/ |
| GEMM | General Matrix Multiplication kernel | bfloat16 | β | β | π’ | example/gemm/ |
| GEMV | General Matrix-Vector Multiplication kernel | bfloat16 | β | β | π’ | example/matrix_vector_mul/ |
| GQA | Grouped Query Attention kernel (Single pipeline) | bfloat16 | β | π’ | example/mha/ | |
| MHA | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | β | π’ | example/mha/ | |
| RMSNorm | RMSNorm kernel | bfloat16 | β | β | π’ | example/rms_norm/ |
| RoPE | Rotary Positional Embedding kernel | bfloat16 | β | β | π’ | example/rope/ |
| SiLU | Sigmoid Linear Unit activation kernel | bfloat16 | β | β | π’ | example/silu/ |
| Softmax | Softmax kernel | bfloat16 | β | β | π’ | example/softmax/ |
| Weighted RMSNorm | Weighted RMSNorm kernel | bfloat16 | β | β | π’ | example/rms_norm/ |
| Copy | Copy | bfloat16 | β | β | π’ | example/mem_copy/ |
| Transpose | Transpose | bfloat16 | β | β | π’ | example/transpose/ |
| AXPY | AXPY | bfloat16 | β | β | π’ | example/axpy/ |
| Reduction | Reduction | bfloat16 | π‘ | |||
| Dequant | Dequant Q4NX from AWQ to bfloat16 | bfloat16 | β | β | π’ | example/dequant/ |
| RELU | RELU | bfloat16 | β | β | π’ | example/relu/ |
| Leaky RELU (WIP) | Leaky RELU kernel | bfloat16 | β | βͺ | example/leaky_relu/ | |
| GELU | GELU | bfloat16 | β | β | π’ | example/gelu/ |
| LayerNorm | LayerNorm | bfloat16 | β | β | π’ | example/layer_norm/ |
| Convolution | Convolution | bfloat16 | π‘ | |||
| MaxPool | MaxPool | bfloat16 | βͺ | |||
| AveragePool | AveragePool | bfloat16 | βͺ | |||
| Tanh | Tanh kernel | bfloat16 | β | β | π’ | example/tanh/ |
| Sigmoid | Sigmoid kernel | bfloat16 | β | β | π’ | example/sigmoid/ |
Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.
| Status | Meaning |
|---|---|
| π’ | Done |
| π‘ | In Development |
| βͺ | Not Assigned |
These instructions will guide you through everything required for building and executing a program on the Ryzenβ’ AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.
Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.
If starting from Ubuntu 24.04 you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:
sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot-
Install XDNAβ’ Driver and XRT:
-
Install the packages needed for IRON and MLIR-AIE:
# Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels sudo apt install \ build-essential clang clang-14 lld lld-14 python3-venv python3-pip -
Setup a virtual environment and activate it:
python3 -m venv ironenv source ironenv/bin/activate python3 -m pip install --upgrade pip -
Source XRT (installed in step 1):
source /opt/xilinx/xrt/setup.sh -
Install required Python packages (from requirements.txt):
MLIR_PYTHON_EXTRAS_SET_VERSION="0.0.8.3" HOST_MLIR_PYTHON_PACKAGE_PREFIX="aie" pip install -r requirements.txt
-
To test your installation, you can try to build and run the example below:
./operators/axpy/test.py
All available operators can be found in operators. These each contain:
op.py: The Python operator interface -- an easy access point to integrate operators into your project that prescribes how to compile the operator (build artifacts) and how to call it at runtime (buffer sizes, etc.)design.py: The implementation of the operator's NPU code. Often references a kernel inaie_kernelsfor the compute core code and describes the data movement using ObjectFIFOs.reference.py: A reference CPU implementation to validate the correctness of the NPU implementation.test.py: An end-to-end test that instantiates and builds the operator, runs it and verifies its outputs against the reference.
NOTE: Be sure the XRT setup script has been sourced and the Python environment is activated:
source /opt/xilinx/xrt/setup.shsource /path/to/ironenv/bin/activate
To build and test all the operators, first generate a list of all test cases, then run them:
mkdir testing && cd testing
../operators/common/discover_tests.py
../scripts/run_tests.py --iter 1You can select a single test to run using the --select flag.
To ensure your code passes CI linting checks before pushing, install the pre-push hook:
cp scripts/hooks/pre-push .git/hooks/pre-push
chmod +x .git/hooks/pre-pushThe hook will run the same linting checks as CI:
- License checks (reuse)
- Python formatting (black)
- C++ formatting (clang-format)
To bypass the hook if needed: git push --no-verify
CopyrightΒ© 2025 Advanced Micro Devices, Inc
