Summary
The aie2p target (and aie2ps, which has identical code) cannot legalize a G_ZEXTLOAD (s128) with an s64 memory operand. This comes up for 128-bit scalar types like _BitInt(128), and more visibly for the built-in sparsity_t mask field embedded in sparse vector types (v128int8_sparse, v256int4_sparse, v64int16_sparse, …). The practical effect is that any C++ code path that materializes a sparsity_t from memory fails to compile, which blocks the entire sparse-MAC intrinsic family (VMAC_vmul_cm_core_X_QX / _Y_QY and relatives) from being reached from user code.
Versions affected
Reproduced on both:
llvm-aie 20.0.0.2026031401+f19b0ed6 (commit f19b0ed6, 2026-03-14)
llvm-aie 21.0.0.2026040201+986b3623 (commit 986b3623, 2026-04-02 nightly)
Minimal reproducer (12 lines, no aie2p/aie_api headers)
#include <stdint.h>
typedef unsigned _BitInt(128) sparsity_t;
extern "C" void repro(const uint8_t *src, sparsity_t *dst) {
sparsity_t m{};
uint8_t *mb = reinterpret_cast<uint8_t *>(&m);
__builtin_memcpy(mb + 0, src + 0, 8);
__builtin_memcpy(mb + 8, src + 8, 8);
*dst = m;
}
Compile with:
clang++ --target=aie2p-none-unknown-elf -std=c++20 -O2 \
-D__AIENGINE__=2 -D__AIEARCH__=21 \
-c repro_s128.cc -o /tmp/repro.o
Error (verbatim, v21)
fatal error: error in backend: unable to legalize instruction:
%7:_(s128) = G_ZEXTLOAD %5:_(p0) :: (load (s64) from %ir.add.ptr3, align 1)
(in function: repro)
Stack dump:
...
3. Running pass 'Function Pass Manager' on module '.../repro_s128.cc'.
4. Running pass 'Legalizer' on function '@repro'
Root cause (from source read)
llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp:561–566:
getActionDefinitionsBuilder({G_ZEXTLOAD, G_SEXTLOAD})
.legalForTypesWithMemDesc({{S32, P0, S8, 8}, {S32, P0, S16, 16}})
.widenScalarToNextPow2(0)
.lowerIfMemSizeNotPow2()
.clampScalar(0, S32, S32)
.lower();
Only s32 destinations are legal. An s128 destination falls through clampScalar(0, S32, S32) (which cannot narrow a load with s128 users) and then .lower() (which has no rule for s128). The corresponding block in aie2ps/AIE2PSLegalizerInfo.cpp:563–568 is byte-identical, so AIE2PS has the same gap.
Notably, the full-width G_LOAD is already legal at {S128, P0, S128, 128} (line 507), and G_MERGE_VALUES is custom-handled for scalar destinations ≥128 bits (line 643, via AIELegalizerHelper::legalizeG_MERGE_VALUES), so the downstream pieces are already in place — only the extload entry is missing.
Proposed fix
A customIf that matches {S128 dest, S64 mem}, and a helper that emits:
%lo = G_LOAD s64 from <ptr>
%hi = G_CONSTANT s64 0 ; for ZEXT; AShr(lo, 63) for SEXT
%dst = G_MERGE_VALUES s128, %lo, %hi
This lets the existing legalization of G_MERGE_VALUES (scalar≥128 → G_BUILD_VECTOR + G_BITCAST) do the heavy lifting. Same change applies identically to AIE2PS.
Impact
This is currently the single blocker for exposing the XDNA2 sparse MAC intrinsics to C++ user code on the Peano toolchain. The hardware instructions exist and are in AMD's AIE2p ISA reference (VMAC_vmul_cm_core_X_QX, _Y_QY, with srSparse_of flag, 2-cycle latency). The aie_api headers that ship with mlir-aie (aie::sparse_vector_input_buffer_stream<T, N>, aie::mmul<M, K, N, T1, T2> sparse overloads) all compile-time fail on Peano today with this same root cause.
Status on my side
I'm drafting the patch on a local branch of this repo and will open a PR once it builds + the XDNA2 sparse MAC kernel compiles against it. Happy to coordinate if someone is already working on this, or to split the fix across aie2p and aie2ps if preferred.
Summary
The
aie2ptarget (andaie2ps, which has identical code) cannot legalize aG_ZEXTLOAD (s128)with ans64memory operand. This comes up for 128-bit scalar types like_BitInt(128), and more visibly for the built-insparsity_tmask field embedded in sparse vector types (v128int8_sparse,v256int4_sparse,v64int16_sparse, …). The practical effect is that any C++ code path that materializes asparsity_tfrom memory fails to compile, which blocks the entire sparse-MAC intrinsic family (VMAC_vmul_cm_core_X_QX/_Y_QYand relatives) from being reached from user code.Versions affected
Reproduced on both:
llvm-aie 20.0.0.2026031401+f19b0ed6(commitf19b0ed6, 2026-03-14)llvm-aie 21.0.0.2026040201+986b3623(commit986b3623, 2026-04-02 nightly)Minimal reproducer (12 lines, no aie2p/aie_api headers)
Compile with:
Error (verbatim, v21)
Root cause (from source read)
llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp:561–566:Only
s32destinations are legal. Ans128destination falls throughclampScalar(0, S32, S32)(which cannot narrow a load with s128 users) and then.lower()(which has no rule for s128). The corresponding block inaie2ps/AIE2PSLegalizerInfo.cpp:563–568is byte-identical, so AIE2PS has the same gap.Notably, the full-width
G_LOADis already legal at{S128, P0, S128, 128}(line 507), andG_MERGE_VALUESis custom-handled for scalar destinations ≥128 bits (line 643, viaAIELegalizerHelper::legalizeG_MERGE_VALUES), so the downstream pieces are already in place — only the extload entry is missing.Proposed fix
A
customIfthat matches{S128 dest, S64 mem}, and a helper that emits:This lets the existing legalization of
G_MERGE_VALUES(scalar≥128 →G_BUILD_VECTOR+G_BITCAST) do the heavy lifting. Same change applies identically toAIE2PS.Impact
This is currently the single blocker for exposing the XDNA2 sparse MAC intrinsics to C++ user code on the Peano toolchain. The hardware instructions exist and are in AMD's AIE2p ISA reference (
VMAC_vmul_cm_core_X_QX,_Y_QY, withsrSparse_offlag, 2-cycle latency). The aie_api headers that ship withmlir-aie(aie::sparse_vector_input_buffer_stream<T, N>,aie::mmul<M, K, N, T1, T2>sparse overloads) all compile-time fail on Peano today with this same root cause.Status on my side
I'm drafting the patch on a local branch of this repo and will open a PR once it builds + the XDNA2 sparse MAC kernel compiles against it. Happy to coordinate if someone is already working on this, or to split the fix across aie2p and aie2ps if preferred.