Summary
A phi node carrying a large struct ({i32, i64 x 128}, 129 fields) with zeroinitializer as the entry value has two issues on
aie2p-none-unknown-elf:
- -O0 crash:
LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT
- -O2 silent miscompile: The generated code fails to zero the ACC2048 hardware registers, causing alternating correct/incorrect results across
repeated kernel invocations on the NPU.
A smaller struct ({i32, i64 x 32}, 33 fields) with the same pattern works correctly at both -O0 and -O2.
Environment
- llvm-aie:
20.0.0.2025090701+8c084497 (commit 8c084497)
- Target:
aie2p-none-unknown-elf
- Hardware (for runtime bug): AMD Ryzen AI NPU (Krackan Point, AIE2P)
Reproducer
See attached peano_bug_repro.ll.
The struct packs a loop counter + 4 ACC2048 vectors (32 × i64 each = 128 i64 fields total):
%acc_state = type { i32, i64, i64, ..., i64 } ; 129 fields
k_loop:
%state = phi %acc_state [ zeroinitializer, %entry ], [ %state_next, %k_continue ]
; extract 4 groups of 32 × i64 → <32 x i64>, call mac.conf, insertvalue back
Reproducing the -O0 crash (no hardware needed)
llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O0 --filetype=obj -o /dev/null
Expected: compiles successfully.
Actual: LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT
Reproducing the -O2 runtime bug (requires AIE2P NPU)
llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O2 --filetype=obj -o repro.o
Compiles successfully, but when linked into an AIE2P kernel and invoked repeatedly:
- Even invocations: wrong results (accumulator retains stale values from previous invocation, e.g. 68 instead of expected 32)
- Odd invocations: correct results (all 32)
- Only fields 1–64 (first two ACC2048 groups) are affected; fields 65–128 (last two) are always correct
- Pattern is 100% deterministic
Working case
The same phi pattern with a 33-field struct (1 accumulator instead of 4) works correctly:
%acc_state_small = type { i32, i64, i64, ..., i64 } ; 33 fields
; → correct ACC zeroing on every invocation, no crash at -O0
Workaround
Restructure the kernel to carry only one <32 x i64> accumulator per loop phi node instead of four.
peano_bug_repro.txt
Summary
A phi node carrying a large struct (
{i32, i64 x 128}, 129 fields) withzeroinitializeras the entry value has two issues onaie2p-none-unknown-elf:LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELTrepeated kernel invocations on the NPU.
A smaller struct (
{i32, i64 x 32}, 33 fields) with the same pattern works correctly at both -O0 and -O2.Environment
20.0.0.2025090701+8c084497(commit 8c084497)aie2p-none-unknown-elfReproducer
See attached
peano_bug_repro.ll.The struct packs a loop counter + 4 ACC2048 vectors (32 × i64 each = 128 i64 fields total):
Reproducing the -O0 crash (no hardware needed)
Expected: compiles successfully.
Actual:
LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELTReproducing the -O2 runtime bug (requires AIE2P NPU)
Compiles successfully, but when linked into an AIE2P kernel and invoked repeatedly:
Working case
The same phi pattern with a 33-field struct (1 accumulator instead of 4) works correctly:
Workaround
Restructure the kernel to carry only one
<32 x i64>accumulator per loop phi node instead of four.peano_bug_repro.txt