Skip to content

[AIE2PS] Vec Accumulator Spilling#975

Draft
F-Stuckmann wants to merge 2 commits into
aie-publicfrom
stuckmann.vecc.acc.spilling
Draft

[AIE2PS] Vec Accumulator Spilling#975
F-Stuckmann wants to merge 2 commits into
aie-publicfrom
stuckmann.vecc.acc.spilling

Conversation

@F-Stuckmann
Copy link
Copy Markdown
Collaborator

@F-Stuckmann F-Stuckmann commented Apr 29, 2026

Core_Compute_Insn_Count_perf_sorted_reduced Core_PMSize_perf_sorted_reduced_absolute Core_StackSize_perf_sorted_reduced_absolute

Mirror llvm/test/CodeGen/AIE/aie2p/ra/spill-vec-acc.mir for aie2ps.
The test runs the full register-allocation pipeline (greedy +
virtregrewriter) on a kernel that builds an acc1024 in BB0, broadcasts
many vec512 values, and consumes them across a tight self-loop in BB1.
Without a combined acc/vec spill class the allocator must spill at
least one 512-bit value to the stack; the captured CHECK lines show
the resulting VST_X_SPILL / VLDA_X_SPILL traffic against %stack.0.

This baseline establishes the pre-feature behavior so the upcoming
combined spill class change can be measured by a CHECK-line diff
(memory spill -> cross-bank allocation).
Introduce a combined 512-bit spill register class that unions the
vector (mXm), accumulator (mBMm), and FIFO (lfh0/lfh1/lfl0/lfl1/
sfl/sfh/lfe) physical registers. Exposing this class to the register
coalescer via getLargestLegalSuperClass lets a 512-bit value stored
in an ACC512 vreg be allocated to a free X register instead of
spilling to the stack when the accumulator bank is under pressure.
This mirrors the existing AIE2P optimization.

The widening is opt-in only for ACC512 and VEC512 (compared by
pointer equality, not hasSubClassEq) to limit ripple effects on
operand-restricted sub-classes that would otherwise alter coalescing
and pre-RA scheduling.

Spill / reload of a composite-class vreg goes through two new
pseudos, VST_512_COMPOSED_REG_SPILL and VLDA_512_COMPOSED_REG_SPILL.
eliminateFrameIndex resolves the frame index to an SP-relative
immediate, and expandPostRAPseudo swaps the descriptor to the native
opcode that matches the actual physical register chosen by the
allocator: VST_dmx_sts_x_spill / VLDA_dmx_lda_x_spill for VEC512,
and VST_dmx_sts_bm_spill / VLDA_dmx_lda_bm_spill for ACC512. AIE2PS
has no native FIFO spill opcode, so the FIFO branch falls through to
report_fatal_error; in practice the allocator should not assign a
FIFO physreg to a composite-class vreg.

A new test exercises both branches of the post-RA descriptor swap
end-to-end through prologepilog and postrapseudos.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant