Skip to content

[aie2p/aie2ps] Backend fails to legalize G_ZEXTLOAD (s128) from (s64) — blocks sparse MAC in C++ #962

@syntax1killer

Description

@syntax1killer

Summary

The aie2p target (and aie2ps, which has identical code) cannot legalize a G_ZEXTLOAD (s128) with an s64 memory operand. This comes up for 128-bit scalar types like _BitInt(128), and more visibly for the built-in sparsity_t mask field embedded in sparse vector types (v128int8_sparse, v256int4_sparse, v64int16_sparse, …). The practical effect is that any C++ code path that materializes a sparsity_t from memory fails to compile, which blocks the entire sparse-MAC intrinsic family (VMAC_vmul_cm_core_X_QX / _Y_QY and relatives) from being reached from user code.

Versions affected

Reproduced on both:

  • llvm-aie 20.0.0.2026031401+f19b0ed6 (commit f19b0ed6, 2026-03-14)
  • llvm-aie 21.0.0.2026040201+986b3623 (commit 986b3623, 2026-04-02 nightly)

Minimal reproducer (12 lines, no aie2p/aie_api headers)

#include <stdint.h>

typedef unsigned _BitInt(128) sparsity_t;

extern "C" void repro(const uint8_t *src, sparsity_t *dst) {
    sparsity_t m{};
    uint8_t *mb = reinterpret_cast<uint8_t *>(&m);
    __builtin_memcpy(mb + 0, src + 0, 8);
    __builtin_memcpy(mb + 8, src + 8, 8);
    *dst = m;
}

Compile with:

clang++ --target=aie2p-none-unknown-elf -std=c++20 -O2 \
        -D__AIENGINE__=2 -D__AIEARCH__=21 \
        -c repro_s128.cc -o /tmp/repro.o

Error (verbatim, v21)

fatal error: error in backend: unable to legalize instruction:
  %7:_(s128) = G_ZEXTLOAD %5:_(p0) :: (load (s64) from %ir.add.ptr3, align 1)
  (in function: repro)
Stack dump:
...
3.  Running pass 'Function Pass Manager' on module '.../repro_s128.cc'.
4.  Running pass 'Legalizer' on function '@repro'

Root cause (from source read)

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp:561–566:

getActionDefinitionsBuilder({G_ZEXTLOAD, G_SEXTLOAD})
    .legalForTypesWithMemDesc({{S32, P0, S8, 8}, {S32, P0, S16, 16}})
    .widenScalarToNextPow2(0)
    .lowerIfMemSizeNotPow2()
    .clampScalar(0, S32, S32)
    .lower();

Only s32 destinations are legal. An s128 destination falls through clampScalar(0, S32, S32) (which cannot narrow a load with s128 users) and then .lower() (which has no rule for s128). The corresponding block in aie2ps/AIE2PSLegalizerInfo.cpp:563–568 is byte-identical, so AIE2PS has the same gap.

Notably, the full-width G_LOAD is already legal at {S128, P0, S128, 128} (line 507), and G_MERGE_VALUES is custom-handled for scalar destinations ≥128 bits (line 643, via AIELegalizerHelper::legalizeG_MERGE_VALUES), so the downstream pieces are already in place — only the extload entry is missing.

Proposed fix

A customIf that matches {S128 dest, S64 mem}, and a helper that emits:

%lo = G_LOAD s64 from <ptr>
%hi = G_CONSTANT s64 0          ; for ZEXT; AShr(lo, 63) for SEXT
%dst = G_MERGE_VALUES s128, %lo, %hi

This lets the existing legalization of G_MERGE_VALUES (scalar≥128 → G_BUILD_VECTOR + G_BITCAST) do the heavy lifting. Same change applies identically to AIE2PS.

Impact

This is currently the single blocker for exposing the XDNA2 sparse MAC intrinsics to C++ user code on the Peano toolchain. The hardware instructions exist and are in AMD's AIE2p ISA reference (VMAC_vmul_cm_core_X_QX, _Y_QY, with srSparse_of flag, 2-cycle latency). The aie_api headers that ship with mlir-aie (aie::sparse_vector_input_buffer_stream<T, N>, aie::mmul<M, K, N, T1, T2> sparse overloads) all compile-time fail on Peano today with this same root cause.

Status on my side

I'm drafting the patch on a local branch of this repo and will open a PR once it builds + the XDNA2 sparse MAC kernel compiles against it. Happy to coordinate if someone is already working on this, or to split the fix across aie2p and aie2ps if preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions