This example showcases both JIT and non-JIT approaches for running IRON designs. A single tile performs a very simple reduction operation where the kernel loads data from local memory, performs the min reduction and stores the resulting value back.
Input data is brought to the local memory of the Compute tile from a Shim tile. The size of the input data N from the Shim tile is configurable (default: 1024xi32 for the non-JIT version, customizable via command-line arguments for the JIT version). The data is copied to the AIE tile, where the reduction is performed. The single output data value is copied from the AIE tile to the Shim tile. Both approaches offer different compilation workflows with the JIT version adding microseconds runtime overhead.
vector_reduce_min_jit.py: A JIT (Just-In-Time) compiled version using IRON's@iron.jitdecorator. This approach offers faster development iteration by compiling and executing the design at runtime, with support for command-line arguments to customize the number of elements.
-
vector_reduce_min.py: A Python script that defines the AIE array structural design using MLIR-AIE operations. This generates MLIR that is then compiled usingaieccto produce design binaries (ie. XCLBIN and inst.bin for the NPU in Ryzen™ AI). -
vector_reduce_min_placed.py: An alternative version of the design invector_reduce_min.py, that is expressed in a lower-level version of IRON. -
test.cpp: This C++ code is a testbench for the non-JIT design example targetting Ryzen™ AI (AIE2). The code is responsible for loading the compiled XCLBIN file, configuring the AIE module, providing input data, and executing the AIE design on the NPU. After executing, the program verifies the results.
reduce_min.cc: A C++ implementation of a vectorizedminreduction operation for AIE cores. The code uses the AIE API, which is a C++ header-only library providing types and operations that get translated into efficient low-level intrinsics, and whose documentation can be found here. The source can be found here.
The JIT approach uses IRON's @iron.jit decorator for runtime compilation, offering faster development iteration and more flexible parameterization.
To run the JIT version with default parameters (1024 elements):
python vector_reduce_min_jit.pyTo run with custom number of elements:
python vector_reduce_min_jit.py --num-elements 2048Or using the short form:
python vector_reduce_min_jit.py -n 512The non-JIT approach uses traditional MLIR-AIE compilation where the design is compiled ahead-of-time to produce binaries.
To compile the design:
makeTo compile the placed design:
env use_placed=1 makeTo compile the C++ testbench:
make vector_reduce_min.exeTo run the design:
make run| Aspect | Non-JIT Approach | JIT Approach |
|---|---|---|
| Compilation | Ahead-of-time via aiecc |
Runtime compilation |
| Development Speed | Slower (manual make/compilation) | Faster (compilation integrated) |
| Host Code | C++ testbench (test.cpp) |
Python script |
| Performance | Baseline execution time | Microseconds overhead from JIT runtime |
| Flexibility | Fixed at compile time | Runtime parameterization |
| Use Case | Explicit XCLBIN management | Dynamic compilation |
| Binary Output | Generates XCLBIN/inst.bin | Cached binaries in NPU_CACHE_HOME (defaults to ~/.npu/) |
When to use each approach:
- Use JIT for rapid prototyping, experimentation, runtime flexibility, and when you don't need control over XCLBINs
- Use non-JIT when you need explicit XCLBIN control, working with existing MLIR-AIE workflows, or distributing pre-compiled binaries