[AIE2PS] Vec Accumulator Spilling#975
Draft
F-Stuckmann wants to merge 2 commits into
Draft
Conversation
Mirror llvm/test/CodeGen/AIE/aie2p/ra/spill-vec-acc.mir for aie2ps. The test runs the full register-allocation pipeline (greedy + virtregrewriter) on a kernel that builds an acc1024 in BB0, broadcasts many vec512 values, and consumes them across a tight self-loop in BB1. Without a combined acc/vec spill class the allocator must spill at least one 512-bit value to the stack; the captured CHECK lines show the resulting VST_X_SPILL / VLDA_X_SPILL traffic against %stack.0. This baseline establishes the pre-feature behavior so the upcoming combined spill class change can be measured by a CHECK-line diff (memory spill -> cross-bank allocation).
Introduce a combined 512-bit spill register class that unions the vector (mXm), accumulator (mBMm), and FIFO (lfh0/lfh1/lfl0/lfl1/ sfl/sfh/lfe) physical registers. Exposing this class to the register coalescer via getLargestLegalSuperClass lets a 512-bit value stored in an ACC512 vreg be allocated to a free X register instead of spilling to the stack when the accumulator bank is under pressure. This mirrors the existing AIE2P optimization. The widening is opt-in only for ACC512 and VEC512 (compared by pointer equality, not hasSubClassEq) to limit ripple effects on operand-restricted sub-classes that would otherwise alter coalescing and pre-RA scheduling. Spill / reload of a composite-class vreg goes through two new pseudos, VST_512_COMPOSED_REG_SPILL and VLDA_512_COMPOSED_REG_SPILL. eliminateFrameIndex resolves the frame index to an SP-relative immediate, and expandPostRAPseudo swaps the descriptor to the native opcode that matches the actual physical register chosen by the allocator: VST_dmx_sts_x_spill / VLDA_dmx_lda_x_spill for VEC512, and VST_dmx_sts_bm_spill / VLDA_dmx_lda_bm_spill for ACC512. AIE2PS has no native FIFO spill opcode, so the FIFO branch falls through to report_fatal_error; in practice the allocator should not assign a FIFO physreg to a composite-class vreg. A new test exercises both branches of the post-RA descriptor swap end-to-end through prologepilog and postrapseudos.
dcb9fb6 to
4318ee0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.