Skip to content
Draft

[WIP] #3025

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
223 commits
Select commit Hold shift + click to select a range
16df05e
Unified compilation workflow: CompilableDesign, CallableDesign, @iron…
hunhoffe Apr 6, 2026
6d4af17
Re-apply migration to test_compile_cache_functionality.py
hunhoffe Apr 6, 2026
dd1cbb5
Finish unify-compilation-workflow plan: CMakeLists, __init__, test cl…
hunhoffe Apr 9, 2026
50821eb
Add comprehensive kernel factory wrappers with unit tests
hunhoffe Apr 9, 2026
7522a18
Add trace_config support to CallableDesign and @iron.jit
hunhoffe Apr 9, 2026
619a13b
migrate passthrough_kernel to @iron.jit
hunhoffe Apr 9, 2026
15c1515
migrate vector_reduce_add to @iron.jit
hunhoffe Apr 9, 2026
f2221c6
migrate vector_scalar_mul to @iron.jit
hunhoffe Apr 9, 2026
268d110
migrate eltwise_add and eltwise_mul to @iron.jit
hunhoffe Apr 9, 2026
7092069
Audit fixes: API quality, naming, validation, brevity
hunhoffe Apr 9, 2026
a04222f
Eliminate duplicate global-scanning logic between compilabledesign an…
hunhoffe Apr 9, 2026
8b8d2dc
Fix lower() to warn when call-time kwargs are overridden by pre-bound…
hunhoffe Apr 9, 2026
c57eff5
Fix lower(): call-time Compile[T] kwargs override pre-bound values
hunhoffe Apr 9, 2026
9247b0a
Fix ExternalFunction hash collision causing flaky NPU tests
hunhoffe Apr 9, 2026
4b3cccd
Fix ExternalFunction cache bugs causing flaky NPU tests
hunhoffe Apr 9, 2026
46e60ff
Replace xfail with skip_on_f32_failure fixture for Peano f32 bug
hunhoffe Apr 9, 2026
0de06cd
Revert programming_examples changes to main state
hunhoffe Apr 13, 2026
2d711fe
Merge branch 'main' into unify-compilation-workflow
hunhoffe Apr 13, 2026
05dc9f3
Move compile/jit/kernels code from iron to utils; split kernels into …
hunhoffe Apr 14, 2026
c439aac
Audit fixes: cache-key collisions, Phoenix LRU workaround, test wiring
hunhoffe May 7, 2026
1b3a762
test_kernels.py: parametrize 36 test classes into a single spec table
hunhoffe May 7, 2026
22188e7
iron/kernels: hoist factory boilerplate into _common.py
hunhoffe May 7, 2026
a2b1f75
CompilableDesign: promote leaked internals to public surface
hunhoffe May 7, 2026
57d29b2
test: dedup wrapping and split tests against CallableDesign
hunhoffe May 7, 2026
e4eba4b
Extract DMA-size parser; switch from regex to MLIR Python bindings
hunhoffe May 7, 2026
eeceb18
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 8, 2026
aadfa7e
Register _dma_size_parser.py in python/CMakeLists.txt
hunhoffe May 8, 2026
04c6f87
small fix
hunhoffe May 8, 2026
b3cc39f
Merge branch 'main' into unify-compilation-workflow
hunhoffe May 8, 2026
483ce01
Code-quality fixes from branch self-review
hunhoffe May 8, 2026
86df6df
passthrough_kernel: migrate Iron API variant to @iron.jit
hunhoffe May 8, 2026
c454f41
xrtruntime: cache read_insts content by (path, mtime)
hunhoffe May 8, 2026
6d8bf08
compilabledesign: memoise generator type-hint introspection
hunhoffe May 8, 2026
4fd1bd2
Revert cmake/modulesXilinx submodule bump to match main
hunhoffe May 8, 2026
83224cc
Apply black formatting to modified Python files
hunhoffe May 8, 2026
0ff4d36
passthrough_kernel: trim oversized comments + factor print_cycles_sum…
hunhoffe May 8, 2026
f10e5cb
passthrough_kernel: report NPU time alongside end-to-end latency
hunhoffe May 8, 2026
99595b8
benchmark: add aie.utils.benchmark.run_iters / print_benchmark helpers
hunhoffe May 8, 2026
0fb52d6
Merge branch 'main' into unify-compilation-workflow
hunhoffe May 8, 2026
b64ca34
passthrough_kernel: remove placed variant, JIT-only
hunhoffe May 8, 2026
cffe0f8
passthrough_kernel: trim lit tests after placed-flow removal
hunhoffe May 8, 2026
4e8fadf
00_memcpy: trim, use run_iters, report bandwidth from NPU time
hunhoffe May 8, 2026
dc20877
01_SAXPY: trim duplicated comments and unused imports
hunhoffe May 8, 2026
039eac4
02_vector_reduce_max: trim duplicated comments and unused imports
hunhoffe May 8, 2026
1e0161b
03_matmul: kernel-library migration + parametrized AOT showcase
hunhoffe May 8, 2026
9424b13
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 8, 2026
55a1670
03_matmul: keep @iron.jit, demonstrate opt-in AOT instead
hunhoffe May 8, 2026
3519ed3
Reformat branch-modified files with black 26
hunhoffe May 8, 2026
0eeb04f
Revert "01_SAXPY: trim duplicated comments and unused imports"
hunhoffe May 8, 2026
12c2360
Revert "02_vector_reduce_max: trim duplicated comments and unused imp…
hunhoffe May 8, 2026
57fc8bb
Add CompilableDesign.specialized + CallableDesign.specialize/compile
hunhoffe May 8, 2026
35f9add
03_matmul: drop hardcoded r/s/t numbers from L2->L1 figure
hunhoffe May 8, 2026
7b042f5
kernels: fix reduce_*() output alignment + starter NPU e2e tests
hunhoffe May 8, 2026
8a3e2be
verify: add aie.utils.verify with nearly_equal + count_mismatches
hunhoffe May 8, 2026
c8cdd49
vector_exp: migrate to @iron.jit + kernels.bf16_exp, drop placed flow
hunhoffe May 8, 2026
daba200
JIT: support multi-DMA-per-tensor + shared-buffer conv kernels
hunhoffe May 9, 2026
0d8ebee
resnet/layers_conv2_x: migrate to @iron.jit + kernels.conv2dk*, drop …
hunhoffe May 9, 2026
3bdffe6
JIT: catch RuntimeError when Peano absent during _compute_hash
hunhoffe May 9, 2026
f5b9070
docs: add whats-new notebook for the unified compilation workflow
hunhoffe May 9, 2026
a71df23
JIT: also populate _expected_tensor_sizes on cache hit
hunhoffe May 9, 2026
2487af0
docs: notebook audit fixes + black-jupyter formatting
hunhoffe May 9, 2026
327fe53
JIT: validate via runtime_sequence args; trace+repr polish; example s…
hunhoffe May 10, 2026
99260a7
docs: notebook audit — fix two stale §-refs after §8 trim
hunhoffe May 10, 2026
9f42c4d
Merge branch 'main' into unify-compilation-workflow
hunhoffe May 10, 2026
63e337f
JIT: reject unannotated scalar params with defaults (Guard 1-C)
hunhoffe May 10, 2026
796245c
CI fixes: parallel-compile test, mini_tutorial migration, npu2 device…
hunhoffe May 10, 2026
22969d6
kernels.mm: expose mac_dims for arch-aware matmul DMA layouts
hunhoffe May 10, 2026
db2d97b
JIT: cache-key honours iron-set device; notebook §11 cross-compile demo
hunhoffe May 10, 2026
ce11e9d
docs: tone down notebook flourishes for a more professional voice
hunhoffe May 10, 2026
5d73165
docs: drop residual CI/PR references from notebook §13
hunhoffe May 10, 2026
a066cb5
docs: notebook audit pass — TL;DR alignment + clarity polish
hunhoffe May 10, 2026
1188f9c
passthrough_kernel: tighten the JIT example as a porting reference
hunhoffe May 10, 2026
21c638e
vector_vector_mul: port to @iron.jit, preserve VCK5000 path
hunhoffe May 10, 2026
29d811a
iron: collapse N-D contiguous arg to match 1-D kernel signature
hunhoffe May 10, 2026
eec85fa
kernels.mm: add b_col_maj parameter
hunhoffe May 10, 2026
478ea2b
whole_array: port to @iron.jit, dual-mode with legacy MLIR-emit
hunhoffe May 10, 2026
44102e1
kernels.mm: add c_col_maj parameter
hunhoffe May 10, 2026
25711b8
whole_array: add c_col_maj support; bind matmul kernel once
hunhoffe May 10, 2026
80e3f5a
iron: kernels.X memoization + object_file_name auto-suffix; collision…
hunhoffe May 10, 2026
fc969cb
whole_array: drop placed/iron variants; collapse lit configs (Stage C)
hunhoffe May 10, 2026
302fd8d
iron: close the two kernels.X footguns from the matmul port
hunhoffe May 10, 2026
43aaa78
aie_kernels/aie2p/mm.cc: clang-format
hunhoffe May 10, 2026
e202c0c
whole_array: restore my_matmul shim for the visualization notebook
hunhoffe May 10, 2026
84e0039
iron: BaseKernel public arg_shape() / arg_dtype() introspection
hunhoffe May 10, 2026
1a1fe73
iron: add use_chess opt-in to JIT compile pipeline
hunhoffe May 10, 2026
aef4c63
test_symbol_prefix: isolate ExternalFunction registry between tests
hunhoffe May 10, 2026
2c06dd7
jit: compile() can write artifacts to caller-specified paths
hunhoffe May 10, 2026
33d9834
kernels.mm: emulate_bf16_mmul_with_bfp16 toggle
hunhoffe May 10, 2026
6d58f79
matmul: whole_array runs through one @iron.jit path
hunhoffe May 10, 2026
ee381da
matmul: single_core runs through one @iron.jit path
hunhoffe May 10, 2026
c14d234
single_core: drop accidentally-committed trace.txt + trace_mm.json
hunhoffe May 10, 2026
15552b1
kernels: replace mm_zero/mv_zero with .zero attribute on mm/mv
hunhoffe May 11, 2026
a4b7fa8
Merge branch 'main' into unify-compilation-workflow
hunhoffe May 11, 2026
2cba186
matmul: matrix_vector runs through one @iron.jit path
hunhoffe May 11, 2026
bbb2ec1
iron: CascadeFlow primitive (cherry-picked from PR #3059)
hunhoffe May 11, 2026
d5a7e86
kernels.cascade_mm: expose .get_only / .put_only / .put_get / .zero
hunhoffe May 11, 2026
1603c68
matmul: cascade runs through one @iron.jit path
hunhoffe May 11, 2026
6ca8a7d
cascade: fifo_depth=2 (dynamic objfifo lowering removes the depth=1 w…
hunhoffe May 11, 2026
26cdbb5
matmul: cosmetic cleanup (README, Makefile style, black)
hunhoffe May 12, 2026
fb4ff7b
docs: notebook covers aiecc_flags + matmul siblings + .zero attribute
hunhoffe May 12, 2026
d74fa5c
Merge remote-tracking branch 'origin/unify-compilation-workflow' into…
hunhoffe May 12, 2026
51adae2
Merge branch 'main' into unify-compilation-workflow
hunhoffe May 12, 2026
e6fc2b3
iron: ExternalFunction collision guard auto-suffixes defaulted object…
hunhoffe May 12, 2026
6464294
Merge remote-tracking branch 'origin/unify-compilation-workflow' into…
hunhoffe May 12, 2026
f6aa6d3
JIT: clear _EXTERN_CACHE on compile() entry
hunhoffe May 13, 2026
379a66c
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 20, 2026
77c09c5
basic/vector_scalar_mul: unify on a single @iron.jit design
hunhoffe May 20, 2026
11f796f
basic/dma_transpose: unify on a single @iron.jit design
hunhoffe May 20, 2026
c12caa2
basic/vector_vector_add: thin design body via algorithms library
hunhoffe May 20, 2026
1e77bf7
iron/algorithms: thread trace_size through transform helpers
hunhoffe May 20, 2026
0feea24
basic/vector_scalar_mul: thin design body via algorithms library
hunhoffe May 20, 2026
27cf190
basic/vector_vector_modulo: thin design body via algorithms library
hunhoffe May 21, 2026
db54784
black: format branch-touched files for latest black
hunhoffe May 21, 2026
59ad3a7
basic/matrix_scalar_add: unify on a single @iron.jit design
hunhoffe May 21, 2026
554a972
basic/passthrough_pykernel: unify on a single @iron.jit design
hunhoffe May 21, 2026
06b5af4
iron/algorithms: add reduce_typed / reduce helpers
hunhoffe May 21, 2026
44ab232
basic/vector_reduce_add: thin design body via reduce_typed
hunhoffe May 21, 2026
3865e7e
basic/vector_reduce_min: thin design body via reduce_typed
hunhoffe May 21, 2026
cf693b2
basic/vector_reduce_max (single_core): thin design body via reduce_typed
hunhoffe May 21, 2026
32105e9
vision/vision_passthrough: unify on a single @iron.jit design
hunhoffe May 21, 2026
a1c6efa
vision/edge_detect: unify on @iron.jit; all 5 kernels via iron.kernel…
hunhoffe May 21, 2026
4278342
basic/* + vision/vision_passthrough: use iron.kernels factories
hunhoffe May 21, 2026
4e1d95f
vision/color_threshold: unify on @iron.jit; kernel via iron.kernels.v…
hunhoffe May 21, 2026
84fbf9b
vision/color_detect: unify on @iron.jit; all 5 kernels via iron.kerne…
hunhoffe May 21, 2026
52bb5a4
vision/color_threshold: real numpy verifier in standalone mode
hunhoffe May 21, 2026
4b1488b
vision/edge_detect: standalone verifier matching test.cpp's OpenCV go…
hunhoffe May 21, 2026
ccae6d2
programming_examples: comment-density + docstring polish (audit pass)
hunhoffe May 21, 2026
0893d92
utils/callabledesign: rename .lower() -> .as_mlir()
hunhoffe May 21, 2026
7be9855
follow-up to .lower() -> .as_mlir() rename
hunhoffe May 21, 2026
53c090e
vision/color_detect: drop missed _placed lit pair; README phrasing up…
hunhoffe May 21, 2026
ae4cb4e
requirements_dev: black[jupyter] so notebook code cells get formatted
hunhoffe May 21, 2026
45efb28
iron.jit: emit ELF-wrapped insts on demand
hunhoffe May 21, 2026
d095297
basic/vector_scalar_add: port to @iron.jit; consolidate runlist variant
hunhoffe May 21, 2026
5e12b81
basic/passthrough_dmas: port to @iron.jit; preserve vck5000 path
hunhoffe May 21, 2026
ae36aa0
basic/row_wise_bias_add: port to @iron.jit; iron-managed kernel.o
hunhoffe May 21, 2026
531caee
basic/chaining_channels: drop _placed suffix; drive aiecc from Python
hunhoffe May 21, 2026
eb2156d
basic/dma_transpose_packet: port to @iron.jit; --packet-sw-objFifos v…
hunhoffe May 21, 2026
ab7ae1b
basic/packet_switch: merge add+mul placed designs; drive aiecc from P…
hunhoffe May 21, 2026
f9ef45d
basic/memcpy: port to @iron.jit; iron-managed passThrough.cc + ELF flow
hunhoffe May 21, 2026
4df8614
basic/transposes: merge four transpose examples into one dispatcher
hunhoffe May 21, 2026
03e874f
basic/vector_vector_add_BDs_init_values: drive aiecc from Python; dro…
hunhoffe May 21, 2026
f5de52c
basic/event_trace: port aie_trace.py to @iron.jit + iron Runtime API
hunhoffe May 21, 2026
80607c5
basic/event_trace/test.py: read bo via bo.map(), not bo.read()
hunhoffe May 21, 2026
6c1fccc
iron.kernels: add compute_max factory; let factories share an .o
hunhoffe May 21, 2026
1cc1625
basic/vector_reduce_max: port multi-core variants to @iron.jit + libr…
hunhoffe May 21, 2026
6d26937
basic/tiling_exploration: port both designs to @iron.jit; drop test.p…
hunhoffe May 21, 2026
2cf7bc1
iron.RuntimeEndpoint: accept ShimPLTile, not just ShimNOCTile
hunhoffe May 21, 2026
df38053
basic/passthrough_dmas: merge plio variants in via --plio flag
hunhoffe May 21, 2026
819887c
basic/packet_switch: correct docstring re ObjectFifo + packet_flow
hunhoffe May 21, 2026
d7813ce
iron: add Lock, Flow, TileDma — explicit-routing primitives
hunhoffe May 21, 2026
42bab99
basic/vector_vector_add_BDs_init_values: port to iron Flow + Lock + T…
hunhoffe May 21, 2026
2a63c28
iron: add PacketFlow, Bd.packet, Lock.acquire/release helpers
hunhoffe May 21, 2026
ccab6fe
basic/packet_switch + vvadd_BDs: clean up to use new iron Lock helpers
hunhoffe May 21, 2026
47e69d5
iron runtime: rt.fill/drain accept packet=(pkt_type, pkt_id) kwarg
hunhoffe May 21, 2026
980ec3f
compile_mlir_module: auto-build ExternalFunctions when device= is set
hunhoffe May 21, 2026
ceb0285
basic/ lits: tighten REQUIRES + fix silently-shared variant lits
hunhoffe May 21, 2026
e5be4cf
basic/ READMEs: factual corrections from audit pass
hunhoffe May 21, 2026
2aab3f4
basic/{event_trace,memcpy}: migrate local AIE kernels to aie.iron.ker…
hunhoffe May 21, 2026
f694eb3
basic/: hand-rolled TAPs → TensorTiler2D generators
hunhoffe May 21, 2026
e6b9e37
iron: add iron.device.from_name(name, n_cols=) helper
hunhoffe May 21, 2026
321aa7d
iron.device.from_name: accept n_cols=None for the family's max device
hunhoffe May 21, 2026
44ee562
basic/: replace per-example _device_for() with iron.device.from_name
hunhoffe May 21, 2026
41b5583
basic/event_trace: drop accidentally-committed trace_timeline.png
hunhoffe May 21, 2026
45c114e
programming_examples/makefile-common: add reusable rule templates
hunhoffe May 22, 2026
dd77644
basic/ Makefiles: adopt jit_xclbin / build_host_exe macros
hunhoffe May 22, 2026
8f788ff
iron utils.verify: add assert_pass() helper
hunhoffe May 22, 2026
45bfe9e
basic/: adopt aie.utils.verify.assert_pass() in 24 designs
hunhoffe May 22, 2026
b181e1b
iron utils.hostruntime: add argparse helpers for the basic/ CLI flags
hunhoffe May 22, 2026
7c0ce7c
basic/: adopt argparse helpers in 27 designs
hunhoffe May 22, 2026
379a4ff
basic/matmul/whole_array/README: rewrite walkthrough to match iron
hunhoffe May 22, 2026
147c208
iron utils.hostruntime: add run_design_cli() dispatcher
hunhoffe May 22, 2026
9b189b0
hostruntime.argparse: add help= text to every flag the helpers expose
hunhoffe May 22, 2026
1d17cda
basic/: adopt run_design_cli() dispatcher in 24 designs
hunhoffe May 22, 2026
eeb8afc
basic/: fix critical sweep bugs (main() left calling deleted _compile…
hunhoffe May 22, 2026
0c7c95e
hostruntime.cli: black-format the dispatcher module
hunhoffe May 22, 2026
e935a83
basic/: polish pass — drop unused imports, run black across touched f…
hunhoffe May 22, 2026
0e036ac
basic/vector_exp/Makefile: drop dead devicename arg
hunhoffe May 22, 2026
045596c
basic/: round-2 polish — fix 2 more _compile_kwargs bugs + drop dead …
hunhoffe May 22, 2026
d6cdfd5
vision/: adopt basic/'s argparse + cli + verify + Makefile helpers
hunhoffe May 22, 2026
8c18ce3
vision/edge_detect: drop accidentally-committed edgeDetectOut_test.jpg
hunhoffe May 22, 2026
187eb94
ml/eltwise_{add,mul}: hand-rolled TAPs → TensorTiler2D.simple_tiler
hunhoffe May 22, 2026
6b6839c
getting_started/: adopt assert_pass + TensorTiler2D in 4 designs
hunhoffe May 22, 2026
77d815a
ml/relu: end-to-end port to @iron.jit + helpers + library kernel
hunhoffe May 22, 2026
17bc8a1
ml/{gelu,silu,swiglu,softmax}: port to @iron.jit + library kernels
hunhoffe May 22, 2026
57f251d
ml/conv2d{,_fused_relu}: port to @iron.jit + library kernels
hunhoffe May 22, 2026
a8d7072
ml/{bottleneck,conv2d_14x14}: port to @iron.jit + library kernels
hunhoffe May 22, 2026
fbedd45
ml/conv2d_14x14: add @iron.jit 32-core variant alongside single-core
hunhoffe May 22, 2026
08a366a
Merge origin/main into unify-compilation-workflow
hunhoffe May 26, 2026
c636fff
python/CMakeLists: ship hostruntime/{argparse,cli}.py in the install
hunhoffe May 26, 2026
6a306d7
whole_array: import NPU2 (latent NameError under @iron.jit)
hunhoffe May 26, 2026
dea1395
hostruntime.cli: make run_and_verify Optional (compile-only designs)
hunhoffe May 26, 2026
94230f7
Revert "hostruntime.cli: make run_and_verify Optional (compile-only d…
hunhoffe May 26, 2026
131b9dc
ml/{gelu,silu,softmax,swiglu}: pass rtol to assert_pass (LUT kernels)
hunhoffe May 26, 2026
9f0cd31
basic/matrix_scalar_add: process the whole matrix, not just the first…
hunhoffe May 26, 2026
64e2ba0
basic/vector_scalar_mul: reorder entry signature to (A, C, F) for tra…
hunhoffe May 26, 2026
159b9af
basic/cascade: Python argparse defaults to m=k=n=32 (match Makefile)
hunhoffe May 26, 2026
1bb8706
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 29, 2026
468be10
ml/eltwise_{add,mul}: port to @iron.jit + library kernels
hunhoffe May 30, 2026
87a07e9
ml/{rmsnorm,scale_shift}: port to @iron.jit + ExternalFunction
hunhoffe May 30, 2026
9677ed3
ml/{layernorm,rope}: port to @iron.jit + ExternalFunction
hunhoffe May 30, 2026
ebfa73c
basic/{packet_switch,vector_vector_add_BDs_init_values}: squash split…
hunhoffe May 30, 2026
3fb0b60
black format
hunhoffe May 30, 2026
15795bc
iron: pythonic cleanups — mutable defaults, counters, typed exceptions
hunhoffe May 30, 2026
50073ee
iron.Worker: kill mutable `[]` default; add Worker.grid staticmethod
hunhoffe May 30, 2026
e41ce83
iron/device: add device.pyi covering the 15 runtime-synthesized classes
hunhoffe May 30, 2026
f03baf7
iron/device: add device_from_args helper
hunhoffe May 30, 2026
213f291
iron.tensor: validate `dtype=` kwarg against typed ndarray
hunhoffe May 30, 2026
e89b707
programming_examples: small idiom fixes
hunhoffe May 30, 2026
11379e1
programming_guide: add implicit MLIR context page
hunhoffe May 30, 2026
2fccef2
iron.tensor: don't strip matching dtype= kwarg
hunhoffe May 30, 2026
cfa9572
argparse: standardise test.py CLI on add_runtime_args; remove create_…
hunhoffe May 30, 2026
d7c0850
programming_examples: migrate 7 designs to device_from_args helper
hunhoffe May 30, 2026
716c59f
programming_examples: migrate 4 designs to Worker.grid
hunhoffe May 30, 2026
04a42b9
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 30, 2026
bbe42bc
framework: dedup helpers, prune dead imports, trim narrative comments
hunhoffe May 30, 2026
0b3f656
programming_examples: migrate 4 nested-Worker designs to Worker.grid
hunhoffe May 30, 2026
1e7d3f2
ml/conv2d_14x14: add torch Conv2d ref + migrate to Worker.grid
hunhoffe May 30, 2026
103da59
test/python: drop 4 redundant tests, fix testsplit_ → test_split_ rename
hunhoffe May 30, 2026
e645c5a
test/python: extract npu2_device fixture, parametrize chess test, hoi…
hunhoffe May 30, 2026
21bdf65
test/python: split test_kernels.py by theme (specs / memoization / ch…
hunhoffe May 30, 2026
2b14749
Merge remote-tracking branch 'origin/main' into unify-compilation-wor…
hunhoffe May 30, 2026
7d0a49e
black: apply formatter to recently-touched files
hunhoffe May 30, 2026
3909a0a
trim trailing whitespace across branch (pre-commit auto-fix)
hunhoffe May 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
42 changes: 21 additions & 21 deletions .github/workflows/buildRyzenWheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -221,14 +221,14 @@ jobs:

build-windows:
name: Build and upload mlir_aie wheels (Windows)

runs-on: windows-2022

permissions:
id-token: write
contents: write
packages: read

strategy:
fail-fast: false
matrix:
Expand All @@ -250,13 +250,13 @@ jobs:

- python_version: "3.13"
ENABLE_RTTI: OFF

- python_version: "3.14"
ENABLE_RTTI: ON

- python_version: "3.14"
ENABLE_RTTI: OFF

steps:
- uses: actions/checkout@v6
with:
Expand Down Expand Up @@ -355,32 +355,32 @@ jobs:
CMAKE_ARGS: -DAIE_BUILD_CHESS_CLANG=OFF -DOPENSSL_USE_STATIC_LIBS=TRUE -DAIE_ENABLE_XRT_PYTHON_BINDINGS=OFF
run: |
set -euo pipefail

# No Vitis on Windows!
unset VITIS XILINXD_LICENSE_FILE || true

git config --global --add safe.directory "$PWD"
MLIR_VERSION=$(git rev-parse --short HEAD)
echo "Building mlir-aie version $MLIR_VERSION"

python -m venv aie-venv
source aie-venv/Scripts/activate

python -m pip install --upgrade pip
pip install -r python/requirements_ml.txt
pip install -r python/requirements_dev.txt

export ENABLE_RTTI="${ENABLE_RTTI}"

NO_RTTI="" # Set a default value
NO_RTTI_UNDERSCORE="" # Set a default value
if [ x"$ENABLE_RTTI" == x"OFF" ]; then
NO_RTTI="-no-rtti"
NO_RTTI_UNDERSCORE="_no_rtti"
fi

VERSION=$(utils/clone-llvm.sh --get-wheel-version)

# Grab the MLIR distro wheel and extract
for attempt in 1 2 3; do
rm -f mlir*.whl
Expand All @@ -394,10 +394,10 @@ jobs:
sleep $((attempt * 5))
done
python -m zipfile -e mlir*.whl .

# Linux-style timestamp magic should work fine on Windows.
find "mlir${NO_RTTI_UNDERSCORE}" -exec touch -a -m -t 201108231405.14 {} \;

# Match Linux wheel version metadata on non-tag builds.
export DATETIME=$(date +"%Y%m%d%H")
if [ x"${{ inputs.AIE_COMMIT }}" == x"" ]; then
Expand All @@ -416,19 +416,19 @@ jobs:
export MLIR_AIE_SOURCE_DIR="${ROOT_WIN}"
export WHEELHOUSE_DIR="${ROOT_WIN}/wheelhouse"
export CMAKE_MODULE_PATH="${ROOT_WIN}/cmake/modulesXilinx"

mkdir -p "${WHEELHOUSE_DIR}"
pushd utils/mlir_aie_wheels

pip install wheel importlib_metadata "ninja!=1.13.0"
CIBW_ARCHS=AMD64 pip wheel . -v -w "${WHEELHOUSE_DIR}" --no-build-isolation

popd

# Try to repair the wheel on Windows using delvewheel (closest auditwheel equivalent).
pip install delvewheel
python -m delvewheel repair --ignore-existing --analyze-existing-exes -w "${WHEELHOUSE_DIR}/repaired_wheel" "${WHEELHOUSE_DIR}"/mlir_aie*.whl

- name: Upload mlir_aie
uses: actions/upload-artifact@v4
with:
Expand Down
32 changes: 16 additions & 16 deletions .github/workflows/mlirDistro.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ jobs:
- name: set ENV
shell: bash
run: |

PIP_FIND_LINKS_URL="https://github.com/Xilinx/mlir-aie/releases/expanded_assets/mlir-distro"
if [ x"${{ github.event_name }}" == x"pull_request" ]; then
PIP_FIND_LINKS_URL="$PIP_FIND_LINKS_URL https://github.com/Xilinx/mlir-aie/releases/expanded_assets/dev-wheels"
Expand Down Expand Up @@ -202,15 +202,15 @@ jobs:
shell: bash
working-directory: ${{ env.TEMP }}
run: |

ls "${{ steps.setup_base.outputs.WORKSPACE_ROOT }}"

if [ x"${{ matrix.OS }}" == x"windows-2022" ]; then
WORKSPACE_ROOT="${{ steps.setup_base.outputs.WORKSPACE_ROOT }}\utils\mlir_wheels"
else
WORKSPACE_ROOT="${{ steps.setup_base.outputs.WORKSPACE_ROOT }}/utils/mlir_wheels"
fi

echo "WORKSPACE_ROOT=$WORKSPACE_ROOT" | tee -a $GITHUB_OUTPUT

# setup llvm
Expand All @@ -219,7 +219,7 @@ jobs:
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
shell: bash
run: |

curl -s https://codeload.github.com/llvm/llvm-project/zip/${{ needs.get_llvm_project_commit.outputs.LLVM_PROJECT_COMMIT }} -o llvm.zip
unzip -q llvm.zip
rm -rf llvm.zip
Expand All @@ -233,7 +233,7 @@ jobs:
shell: bash
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
run: |

APPLY_PATCHES=${{ inputs.APPLY_PATCHES == '' && 'true' || inputs.APPLY_PATCHES }} \
CIBW_ARCHS=${{ matrix.ARCH }} \
CMAKE_GENERATOR=Ninja \
Expand All @@ -249,12 +249,12 @@ jobs:
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
shell: bash
run: |

export APPLY_PATCHES=${{ inputs.APPLY_PATCHES == '' && 'true' || inputs.APPLY_PATCHES }}
./scripts/apply_patches.sh

pip install --upgrade pip

CIBW_ARCHS=${{ matrix.ARCH }} \
CMAKE_GENERATOR=Ninja \
DATETIME=${{ needs.get_llvm_project_commit.outputs.DATETIME }} \
Expand All @@ -267,15 +267,15 @@ jobs:
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
shell: bash
run: |

rm -rf llvm-project
rm -rf build

- name: Docker prune
if: contains(inputs.MATRIX_OS, 'ubuntu')
shell: bash
run: |

docker system prune -a -f

- name: Get wheel version
Expand All @@ -293,7 +293,7 @@ jobs:
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
shell: bash
run: |

ccache -s
HOST_CCACHE_DIR="$(ccache --get-config cache_dir)"
rm -rf $HOST_CCACHE_DIR
Expand All @@ -309,7 +309,7 @@ jobs:
working-directory: ${{ steps.workspace_root.outputs.WORKSPACE_ROOT }}
shell: bash
run: |

ccache --print-stats
HOST_CCACHE_DIR="$(ccache --get-config cache_dir)"
# Set the timestamp to the beginning of the current hour.
Expand All @@ -334,19 +334,19 @@ jobs:
fi
unzip -j wheelhouse/mlir*whl "mlir/bin/$TOOL" -d native_tools/
done

if [ x"${{ matrix.OS }}" == x"ubuntu-22.04" ]; then
PLAT="linux"
elif [ x"${{ matrix.OS }}" == x"windows-2022" ]; then
PLAT="win"
fi

PLAT=${PLAT}_$(echo ${{ matrix.ARCH }} | tr '[:upper:]' '[:lower:]')
pushd native_tools

MLIR_WHEEL_VERSION=${{ steps.get_wheel_version.outputs.MLIR_WHEEL_VERSION }} \
python setup.py bdist_wheel --dist-dir ../wheelhouse --plat $PLAT

popd

# done
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,8 @@ xrt-smi examine
1. Install IRON library by installing the `mlir-aie` wheels:

For installing the `mlir-aie` wheels, there are 3 options. Note that for whichever path you take,
it is important to sync the `mlir-aie` wheels version, the github repo commit, and the requirements versions.
If you install from something other than the latest wheels, make sure
it is important to sync the `mlir-aie` wheels version, the github repo commit, and the requirements versions.
If you install from something other than the latest wheels, make sure
you use the repo commit -- and installation instructions -- from that point in time.

1. **Latest:** For the latest wheels (not necessarily a release):
Expand Down
6 changes: 4 additions & 2 deletions cmake/install_headers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ function(install_headers SRCPATH BUILDPATH INSTALLPATH HEADERS_NAME)

message("Copying ${HEADERS_NAME} includes from ${SRCPATH} to ${BUILDPATH}/${HEADERS_NAME}")

# copy header files into build area
file(GLOB_RECURSE headers_to_copy ${SRCPATH}/*.h ${SRCPATH}/*.hpp)
# Include .cc/.cpp so build/include matches install/include for in-tree
# tests that resolve kernel sources via cxx_header_path().
file(GLOB_RECURSE headers_to_copy
${SRCPATH}/*.h ${SRCPATH}/*.hpp ${SRCPATH}/*.cc ${SRCPATH}/*.cpp)
foreach(header ${headers_to_copy})
file(RELATIVE_PATH rel_path ${SRCPATH} ${header})

Expand Down
8 changes: 4 additions & 4 deletions docs/Building.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Building the MLIR-AIE Codebase on Linux

These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones **Ubuntu 24.04** or **Ubuntu 24.10** install. It is possible to use **Ubuntu 22.04** however you must follow the documentation on the [xdna-driver](https://github.com/amd/xdna-driver) repository to configure the Linux kernel, driver and runtime for deployment.
These instructions will guide you through everything required for building and executing a program on the Ryzen™ AI NPU, starting from a fresh bare-bones **Ubuntu 24.04** or **Ubuntu 24.10** install. It is possible to use **Ubuntu 22.04** however you must follow the documentation on the [xdna-driver](https://github.com/amd/xdna-driver) repository to configure the Linux kernel, driver and runtime for deployment.

## Initial Setup

Expand All @@ -9,7 +9,7 @@ These instructions will guide you through everything required for building and e
If starting from `Ubuntu 24.04` you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:

```bash
sudo apt update
sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot
```
Expand Down Expand Up @@ -124,7 +124,7 @@ xrt-smi examine

1. Install required Python packages:
```bash
# Install basic Python requirements
# Install basic Python requirements
python3 -m pip install -r python/requirements.txt
```

Expand Down Expand Up @@ -204,7 +204,7 @@ If the [upstream packages](#install-from-upstream-packages-ubuntu-2404) do not s
### Update BIOS:

Be sure you have the latest BIOS for your laptop or mini PC, this will ensure the NPU (sometimes referred to as IPU) is enabled in the system. You may need to manually enable the NPU:
```Advanced → CPU Configuration → IPU```
```Advanced → CPU Configuration → IPU```

> **NOTE:** Some manufacturers only provide Windows executables to update the BIOS, please do this before installing Ubuntu.

Expand Down
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ Turn off SecureBoot (Allows for unsigned drivers to be installed):
1. Install IRON library by installing the `mlir-aie` wheels:

For installing the `mlir-aie` wheels, there are 3 options. Note that for whichever path you take,
it is important to sync the `mlir-aie` wheels version, the github repo commit, and the requirements versions.
If you install from something other than the latest wheels, make sure
it is important to sync the `mlir-aie` wheels version, the github repo commit, and the requirements versions.
If you install from something other than the latest wheels, make sure
you use the repo commit -- and installation instructions -- from that point in time.

1. **Latest:** For the latest wheels (not necessarily a release):
Expand Down
6 changes: 2 additions & 4 deletions programming_examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2024, Advanced Micro Devices, Inc.
// Copyright (C) 2024-2026, Advanced Micro Devices, Inc.
//
//===----------------------------------------------------------------------===//-->

# <ins>Programming Examples</ins>

These programming examples are provided so that application programmers can learn how to leverage the IRON design flow with mlir-aie python bindings, and the mlir-aie intermediate representation directly to build applications targeting AI Engines.

Each IRON example has one or more implementations:
* `<example_name>.py` - These designs are generally written using a higher-level version of IRON
* `<example_name>_placed.py` - These designs are generally written using a lower-level verion of IRON
Most examples are a single `<example_name>.py` design driven by `@iron.jit` — one file describes the AIE-array dataflow, JIT-compiles to xclbin/insts, and runs end-to-end (or feeds the prebuilt artifacts to a C++ host). A few examples additionally provide an `<example_name>_placed.py` variant written against a lower-level form of IRON for the cases where explicit tile/core placement is the pedagogical point.

They are organized into the following directories:

Expand Down
8 changes: 7 additions & 1 deletion programming_examples/algorithms/for_each.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,13 @@ def main():
initial_tensor = tensor.numpy().copy()

# JIT compile the algorithm
iron.jit(for_each)(lambda a: a + 1, tensor, tile_size=16)
iron.jit(for_each)(
tensor,
func=lambda a: a + 1,
N=int(tensor.shape[0]),
dtype=tensor.dtype,
tile_size=16,
)

# Check the correctness of the result
e = np.equal(initial_tensor + 1, tensor.numpy())
Expand Down
9 changes: 8 additions & 1 deletion programming_examples/algorithms/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ def main():
output = iron.zeros_like(input)

# JIT compile the algorithm
iron.jit(transform)(lambda a: a + 1, input, output, tile_size=16)
iron.jit(transform)(
input,
output,
func=lambda a: a + 1,
N=int(input.shape[0]),
dtype=input.dtype,
tile_size=16,
)

# Check the correctness of the result
e = np.equal(input.numpy() + 1, output.numpy())
Expand Down
10 changes: 9 additions & 1 deletion programming_examples/algorithms/transform_binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,15 @@ def main():
output = iron.zeros_like(input0)

# JIT compile the algorithm
iron.jit(transform_binary)(lambda a, b: a + b, input0, input1, output, tile_size=16)
iron.jit(transform_binary)(
input0,
input1,
output,
func=lambda a, b: a + b,
N=int(input0.shape[0]),
dtype=input0.dtype,
tile_size=16,
)

# Check the correctness of the result
e = np.equal(input0.numpy() + input1.numpy(), output.numpy())
Expand Down
9 changes: 8 additions & 1 deletion programming_examples/algorithms/transform_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ def main():
output = iron.zeros_like(input)

# JIT compile the algorithm
iron.jit(transform_parallel)(lambda a: a + 1, input, output, tile_size=16)
iron.jit(transform_parallel)(
input,
output,
func=lambda a: a + 1,
N=int(input.shape[0]),
dtype=input.dtype,
tile_size=16,
)

# Check the correctness of the result
e = np.equal(input.numpy() + 1, output.numpy())
Expand Down
Loading
Loading