Skip to content

[AIE2PS] Fix alignment of 64-bit spills and improve CSR spills of L registers#1001

Merged
khallouh merged 5 commits into
aie-publicfrom
khallouh.fix.64.align
May 21, 2026
Merged

[AIE2PS] Fix alignment of 64-bit spills and improve CSR spills of L registers#1001
khallouh merged 5 commits into
aie-publicfrom
khallouh.fix.64.align

Conversation

@khallouh
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSInstrInfo.cpp Outdated
@khallouh khallouh force-pushed the khallouh.fix.64.align branch from c5e3584 to eb7dacd Compare May 15, 2026 12:54
niwinanto
niwinanto previously approved these changes May 15, 2026
Copy link
Copy Markdown
Collaborator

@niwinanto niwinanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@khallouh khallouh force-pushed the khallouh.fix.64.align branch 2 times, most recently from 00e3cfd to f514ecf Compare May 15, 2026 14:06
Comment thread llvm/test/CodeGen/AIE/aie2ps/bfp16_13_mul.ll Outdated
; CHECK-NEXT: lda m2, [p6], #-4; mov dj3, #0
; CHECK-NEXT: lda dj2, [p6], #-4; mov dc2, dj4
; CHECK-NEXT: lda dj6, [p6], #-4; movs dc6, dj4; or r30, r5, r5; mov r5, dj4
; CHECK-NEXT: lda dn2, [p6, #0]; st r9:r8, [sp, #-64]; or r8, r7, r7; mov r7, dj4 // 8-byte Folded Spill
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason we are spilling more. luckily we could pack the stores within available bundles and we also have more unused stack because of the alignment (64 bytes). Do you have an idea?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see comment above

@khallouh khallouh force-pushed the khallouh.fix.64.align branch from f514ecf to 63e04b5 Compare May 18, 2026 12:29
@khallouh khallouh changed the title [AIE2PS] Fix alignment of 64-bit spills [AIE2PS] Fix alignment of 64-bit spills and improve CSR spills of L registers May 18, 2026
@khallouh
Copy link
Copy Markdown
Collaborator Author

Updated. There are no regressions anymore including in mllib L1 (full run). Also conv2 and gemm don't change in the tests. The improvements in those kernels are mostly in the setup and main functions. I am currently running mllib again (reduced run) to get updated improvement numbers since they are supposed to increase with the last commit.

@khallouh
Copy link
Copy Markdown
Collaborator Author

There are only improvements, zero regressions
image

Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
// Determine if both subregisters actually need saving.
// LRegMarked alone doesn't mean both - check individual GPR marks.
bool BothNeeded =
(EvenMarked && OddMarked) || (LRegMarked && !EvenMarked && !OddMarked);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have both L and R at the same time? Can we assert?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check !EvenMarked && !OddMarked is meant for the case where only one GPR is touched, since we have its super-register in the CSR list it will also be marked by the generic determineCalleeSaves for saving, but we don't want to save the entire L just the gpr. This is then handled in the else case where we reset the L and mark the single GPR that is actually needed.

    } else {
      // Only one GPR needs saving - clear L and keep only the needed GPR.
      SavedRegs.reset(LReg);
      if (EvenMarked || (LRegMarked && !OddMarked))
        SavedRegs.set(EvenGPR);
      else
        SavedRegs.reset(EvenGPR);
      if (OddMarked || (LRegMarked && !EvenMarked))
        SavedRegs.set(OddGPR);
      else
        SavedRegs.reset(OddGPR);
    }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means, there could be cases L register and corresponding R register marked at the same time, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, but only one of them. This is a big limitation of the generic determineCalleeSaves as it will happily mark all super registers for saving even if only of their subregisters is actually used. So far this has not been an issue as our CSR list only contained atomic registers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could add an assert assert((!EvenMarked || LRegMarked) && (!OddMarked || LRegMarked) && "sub-reg mark without L pair mark violates invariant");

Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
Comment on lines +100 to +101
static const MCPhysReg LRegs[] = {AIE2PS::l4, AIE2PS::l5, AIE2PS::l6,
AIE2PS::l7};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const MCPhysReg LRegs[] = {AIE2PS::l4, AIE2PS::l5, AIE2PS::l6,
AIE2PS::l7};
const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
for (unsigned i = 0; CSRegs[i]; ++i) {
MCPhysReg Reg = CSRegs[i];
if (!AIE2PS::eLRegClass.contains(Reg))
continue;
...

Since we use multiple calling conventions, would be nice to derive this list than hardcoding. At this moment it has no correctness impact, but it can change in the future.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a hardcoded list anyway as part of the ABI. I don't think it will change for this target, this is already in an aie2ps target specific hook. We won't be sharing this with another target. If the same list is needed somewhere else in the future, we can factor it out for reuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: The list should not change for aie2ps but we have a test save_partial_L that overrides the calee saved list to exclude $l4. Once I added the assert below, it triggered it

assert((!EvenMarked || LRegMarked) && (!OddMarked || LRegMarked) &&
   "sub-reg mark without L pair mark violates invariant"); 

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I will take your suggestion to derive the list


if (BothNeeded) {
// Both subregisters need saving.
if (MFI.hasCalls()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don`t fully understand why we have to do special handling with a call in the function, since we are working on callee saved registers. Would be nice to include a comment for future refernce.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, when there is calls we mark the L register so that we get a single spill instead of 2. When there are no calls, we prefer marking the subregsiters since they can be copied to non CSR registers instead of spilled to memory (There is no move instruction between L registers). For the call case we have no choice but to spill anyway since we don't know which registers the callee is going to use. I'll add this as a comment.

Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp
@khallouh khallouh force-pushed the khallouh.fix.64.align branch from 63e04b5 to 51e5ec7 Compare May 20, 2026 17:55
@khallouh
Copy link
Copy Markdown
Collaborator Author

I addressed the comments and fixed up the test for the call case to reflect realistic MIR (not using CSR registers in the PseudoRET)

khallouh added 3 commits May 20, 2026 11:58
l registers are only considered callee-saved by the prologepilog emitter
via their subregisters, so when a greedy allocates an l register that is
callee-saved, the prologeemitter will save each of its gpr subregisters
instead of saving the entire l directly. This adds those l registers to
the csr_aie2ps list
@khallouh khallouh force-pushed the khallouh.fix.64.align branch from 51e5ec7 to 7feb5da Compare May 20, 2026 18:07
Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp
@andcarminati
Copy link
Copy Markdown
Collaborator

This looks good, thank you for fixing this issue! I I just left some comments.

@khallouh khallouh force-pushed the khallouh.fix.64.align branch from 7feb5da to 8e0f01c Compare May 21, 2026 09:28
Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
@khallouh khallouh force-pushed the khallouh.fix.64.align branch from 8e0f01c to 0923f24 Compare May 21, 2026 10:16
niwinanto
niwinanto previously approved these changes May 21, 2026
Copy link
Copy Markdown
Collaborator

@niwinanto niwinanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing the comments.

Comment thread llvm/lib/Target/AIE/aie2ps/AIE2PSFrameLowering.cpp Outdated
khallouh added 2 commits May 21, 2026 04:43
The test was using CSR registers for function returns which led to the
reload happening before the PseudoRET where it would have to be
afterwards for a normal use. We ended up returning the l register that
was saved from the caller function instead of the one defined inside the
function.
This fixes the test to show realistic MIR where the register is used by
a KILL instruction instead of the PseudoRET, now the reload happens
after the KILL and right before the return restoring the CSR value
coming from the caller.
@khallouh khallouh force-pushed the khallouh.fix.64.align branch from 220b48e to cac0f5e Compare May 21, 2026 10:43
@khallouh khallouh enabled auto-merge (rebase) May 21, 2026 10:46
@khallouh khallouh merged commit b6cddea into aie-public May 21, 2026
7 checks passed
@khallouh khallouh deleted the khallouh.fix.64.align branch May 21, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants