Skip to content

[AIE2P] Large struct phi with zeroinitializer fails to zero ACC2048 registers; crashes at -O0 #805

@paintedsnipedotll

Description

@paintedsnipedotll

Summary

A phi node carrying a large struct ({i32, i64 x 128}, 129 fields) with zeroinitializer as the entry value has two issues on
aie2p-none-unknown-elf:

  1. -O0 crash: LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT
  2. -O2 silent miscompile: The generated code fails to zero the ACC2048 hardware registers, causing alternating correct/incorrect results across
    repeated kernel invocations on the NPU.

A smaller struct ({i32, i64 x 32}, 33 fields) with the same pattern works correctly at both -O0 and -O2.

Environment

  • llvm-aie: 20.0.0.2025090701+8c084497 (commit 8c084497)
  • Target: aie2p-none-unknown-elf
  • Hardware (for runtime bug): AMD Ryzen AI NPU (Krackan Point, AIE2P)

Reproducer

See attached peano_bug_repro.ll.

The struct packs a loop counter + 4 ACC2048 vectors (32 × i64 each = 128 i64 fields total):

%acc_state = type { i32, i64, i64, ..., i64 }  ; 129 fields

k_loop:
  %state = phi %acc_state [ zeroinitializer, %entry ], [ %state_next, %k_continue ]
  ; extract 4 groups of 32 × i64 → <32 x i64>, call mac.conf, insertvalue back

Reproducing the -O0 crash (no hardware needed)

llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O0 --filetype=obj -o /dev/null

Expected: compiles successfully.
Actual: LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT

Reproducing the -O2 runtime bug (requires AIE2P NPU)

llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O2 --filetype=obj -o repro.o

Compiles successfully, but when linked into an AIE2P kernel and invoked repeatedly:

  • Even invocations: wrong results (accumulator retains stale values from previous invocation, e.g. 68 instead of expected 32)
  • Odd invocations: correct results (all 32)
  • Only fields 1–64 (first two ACC2048 groups) are affected; fields 65–128 (last two) are always correct
  • Pattern is 100% deterministic

Working case

The same phi pattern with a 33-field struct (1 accumulator instead of 4) works correctly:

%acc_state_small = type { i32, i64, i64, ..., i64 }  ; 33 fields
; → correct ACC zeroing on every invocation, no crash at -O0

Workaround

Restructure the kernel to carry only one <32 x i64> accumulator per loop phi node instead of four.

peano_bug_repro.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions