Skip to content

[AIE] ReservedRegsLICM: extend hoisting/sinking to nested loops and save/restore brackets#976

Open
andcarminati wants to merge 5 commits into
aie-publicfrom
andreu.reserved.regs
Open

[AIE] ReservedRegsLICM: extend hoisting/sinking to nested loops and save/restore brackets#976
andcarminati wants to merge 5 commits into
aie-publicfrom
andreu.reserved.regs

Conversation

@andcarminati
Copy link
Copy Markdown
Collaborator

This series extends the ReservedRegsLICM pass in three ways:

  1. Nested loop support. The pass previously only processed the innermost
    loop. It now iterates over all loops in the nest (innermost first via
    reverse preorder), allowing loop-invariant control register assignments
    to be hoisted or sinked out of outer loops as well.

  2. Correctness fix for multi-block loops. processForExitSink could
    incorrectly sink a register's last def to the exit block in loops where
    the register is defined in one conditional branch but not another. On
    the path that skips the def the register is live at the latch entry,
    carrying a value from a predecessor; sinking the def would expose the
    wrong value on that path. A LatchEntryLive pre-computation (walking the
    latch backward) guards sinking with !LatchEntryLive.contains(reg).

  3. Save/restore bracket transformation. Some loops protect a reserved
    register with a save/restore bracket:

    vreg = COPY $reserved_reg ; save pre-loop value
    $reserved_reg = MOVX imm ; set loop-invariant value
    ... uses of $reserved_reg ...
    $reserved_reg = COPY vreg ; restore pre-loop value

    The existing hoisting/sinking cannot handle this because the register
    is a loop livein (the save reads it) and has two defs (set + restore).
    The new findSaveRestoreBracket / processSaveRestoreBracket functions
    detect and transform the pattern as a unit: save+set are moved to the
    preheader (in order, so the save captures the pre-loop value before the
    set overwrites it) and the restore is sinked to the exit block. The loop
    body then always sees the loop-invariant value with no per-iteration
    save/restore overhead. Safety checks ensure no uses of the register
    appear before the save, after the restore, or in non-latch blocks of
    the loop.

Tests are added for all three cases, including a regression test that
verifies brackets with uses outside the [set, restore] window are
correctly rejected.

@andcarminati
Copy link
Copy Markdown
Collaborator Author

QoR:
image

Avg reduction of: -0.17%

@konstantinschwarz
Copy link
Copy Markdown
Collaborator

I had a competing local implementation, so I asked AI to compare the two implementations.
It found two bugs in this implementation, I added the reproducers
reserved-reg-licm-livein-bug.mir.txt
reserved-reg-licm-latch-vs-exiting-bug.mir.txt

In a multi-block loop with conditional branches, a candidate register
may be defined on some paths to the latch but not others. On the paths
that skip the definition the register is live at the latch entry,
carrying a value from a previous iteration. Sinking the last def from
the latch to the exit block would expose that stale value to the
instructions that use the register before the def in the latch.

Fix this by pre-computing LatchEntryLive — the set of registers live
at the entry of the latch block — and rejecting any sink candidate
whose register appears in that set.

Add a test (srs_dyn_sign_conditional_no_sink) where $srssign0 is
defined in one conditional branch but not the other, and the latch
uses it before redefining it. The pass must leave both the COPY and
the MOVX_mvx_cr_imm in place.
Some loops protect a reserved register with a save/restore bracket:

  vreg = COPY $reserved_reg    ; save pre-loop value
  $reserved_reg = MOVX imm     ; set loop-invariant value
  ... uses of $reserved_reg ...
  $reserved_reg = COPY vreg    ; restore pre-loop value

The existing hoisting and sinking passes cannot handle this pattern:
the register is a loop livein (the save reads it before the set
defines it), and it has two defs (set + restore), so both
getUniqueDef and the livein guard block any transformation.

Add findSaveRestoreBracket / processSaveRestoreBracket to detect and
transform the pattern as a unit. The detection checks that:
- the save is a vreg = COPY $reserved_reg with exactly one use (the
  restore),
- no use of the register occurs between the save and the set,
- the set is loop-invariant,
- the restore is the last def of the register in the latch and copies
  exactly the vreg from the save,
- the register is not used in any non-latch block of the loop (which
  would observe the wrong value after the set is hoisted).

When the pattern matches, the save and set are moved to the preheader
(in order, so the save captures the pre-loop value before the set
overwrites it) and the restore is sinked to the exit block. The loop
body then always sees the loop-invariant value with no per-iteration
save/restore overhead.

Add a test (crsat_save_restore_bracket) that verifies the
transformation for $crsat.
The flat sequential backward walk in collectLoopReservedLiveins was
incorrect for loops with conditional paths. When L.getBlocks() returns
blocks in RPO order (header first, latch last), a def in one branch
could remove a register from the live set after a sibling branch's use
had already added it back. This caused the register to be incorrectly
classified as not a loop livein, allowing processForExitSink to sink
its definition to the exit block even though the register carried a
pre-loop value on some paths.

For example, in a loop where bb.2 uses $srssign0 and bb.3 defines it
(both successors of the header), the old code would process bb.2's use
first (adding $srssign0 to the live set) and then bb.3's def (removing
it), concluding $srssign0 is not a livein. But $srssign0 IS a livein
because bb.2 can execute on the first iteration without going through
bb.3.

Fix this by replacing the flat walk with a backward dataflow worklist
algorithm that correctly propagates liveness through the CFG until a
fixed point is reached. The algorithm initialises live-in sets to empty
and grows them monotonically, re-queuing predecessors whenever a
block's live-in set changes.

Add a test case (livein_bug_no_sink) to the aie2ps reserved-reg-licm
test that reproduces the bug and verifies the MOVX is not sunk.
…the latch

processForExitSink was scanning L.getLoopLatch() for sink candidates.
When the latch and the exiting block differ, instructions in the latch
are not on the exit path: sinking them to the exit block inserts a def
that was never executed on that path, corrupting the register's value
at the exit.

Fix this by scanning L.getExitingBlock() instead. The guard that checks
whether a register is live at the entry of the scanned block is also
updated to use the exiting block (renamed LatchEntryLive ->
ExitingBlockEntryLive).

Add a test case (latch_vs_exiting_no_sink) that reproduces the bug:
a loop where the latch (bb.3) defines $srssign0 = MOVX 0 but the
exiting block (bb.2) exits directly to bb.4 without going through bb.3.
The MOVX must remain in bb.3.
@andcarminati andcarminati force-pushed the andreu.reserved.regs branch from 447ee79 to 40e5a8c Compare May 4, 2026 20:46
@andcarminati
Copy link
Copy Markdown
Collaborator Author

I had a competing local implementation, so I asked AI to compare the two implementations. It found two bugs in this implementation, I added the reproducers reserved-reg-licm-livein-bug.mir.txt reserved-reg-licm-latch-vs-exiting-bug.mir.txt

Hi, thank you very much for catching those cases. The implementation covers those cases now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants