[AIE2PS] Fix alignment of 64-bit spills and improve CSR spills of L registers#1001
Conversation
c5e3584 to
eb7dacd
Compare
00e3cfd to
f514ecf
Compare
| ; CHECK-NEXT: lda m2, [p6], #-4; mov dj3, #0 | ||
| ; CHECK-NEXT: lda dj2, [p6], #-4; mov dc2, dj4 | ||
| ; CHECK-NEXT: lda dj6, [p6], #-4; movs dc6, dj4; or r30, r5, r5; mov r5, dj4 | ||
| ; CHECK-NEXT: lda dn2, [p6, #0]; st r9:r8, [sp, #-64]; or r8, r7, r7; mov r7, dj4 // 8-byte Folded Spill |
There was a problem hiding this comment.
For some reason we are spilling more. luckily we could pack the stores within available bundles and we also have more unused stack because of the alignment (64 bytes). Do you have an idea?
There was a problem hiding this comment.
yes, see comment above
f514ecf to
63e04b5
Compare
|
Updated. There are no regressions anymore including in mllib L1 (full run). Also conv2 and gemm don't change in the tests. The improvements in those kernels are mostly in the setup and main functions. I am currently running mllib again (reduced run) to get updated improvement numbers since they are supposed to increase with the last commit. |
| // Determine if both subregisters actually need saving. | ||
| // LRegMarked alone doesn't mean both - check individual GPR marks. | ||
| bool BothNeeded = | ||
| (EvenMarked && OddMarked) || (LRegMarked && !EvenMarked && !OddMarked); |
There was a problem hiding this comment.
Do we have both L and R at the same time? Can we assert?
There was a problem hiding this comment.
The check !EvenMarked && !OddMarked is meant for the case where only one GPR is touched, since we have its super-register in the CSR list it will also be marked by the generic determineCalleeSaves for saving, but we don't want to save the entire L just the gpr. This is then handled in the else case where we reset the L and mark the single GPR that is actually needed.
} else {
// Only one GPR needs saving - clear L and keep only the needed GPR.
SavedRegs.reset(LReg);
if (EvenMarked || (LRegMarked && !OddMarked))
SavedRegs.set(EvenGPR);
else
SavedRegs.reset(EvenGPR);
if (OddMarked || (LRegMarked && !EvenMarked))
SavedRegs.set(OddGPR);
else
SavedRegs.reset(OddGPR);
}
There was a problem hiding this comment.
It means, there could be cases L register and corresponding R register marked at the same time, right?
There was a problem hiding this comment.
Exactly, but only one of them. This is a big limitation of the generic determineCalleeSaves as it will happily mark all super registers for saving even if only of their subregisters is actually used. So far this has not been an issue as our CSR list only contained atomic registers.
There was a problem hiding this comment.
Nit: You could add an assert assert((!EvenMarked || LRegMarked) && (!OddMarked || LRegMarked) && "sub-reg mark without L pair mark violates invariant");
| static const MCPhysReg LRegs[] = {AIE2PS::l4, AIE2PS::l5, AIE2PS::l6, | ||
| AIE2PS::l7}; |
There was a problem hiding this comment.
| static const MCPhysReg LRegs[] = {AIE2PS::l4, AIE2PS::l5, AIE2PS::l6, | |
| AIE2PS::l7}; | |
| const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs(); | |
| for (unsigned i = 0; CSRegs[i]; ++i) { | |
| MCPhysReg Reg = CSRegs[i]; | |
| if (!AIE2PS::eLRegClass.contains(Reg)) | |
| continue; | |
| ... |
Since we use multiple calling conventions, would be nice to derive this list than hardcoding. At this moment it has no correctness impact, but it can change in the future.
There was a problem hiding this comment.
It is a hardcoded list anyway as part of the ABI. I don't think it will change for this target, this is already in an aie2ps target specific hook. We won't be sharing this with another target. If the same list is needed somewhere else in the future, we can factor it out for reuse.
There was a problem hiding this comment.
Update: The list should not change for aie2ps but we have a test save_partial_L that overrides the calee saved list to exclude $l4. Once I added the assert below, it triggered it
assert((!EvenMarked || LRegMarked) && (!OddMarked || LRegMarked) &&
"sub-reg mark without L pair mark violates invariant");
There was a problem hiding this comment.
So I will take your suggestion to derive the list
|
|
||
| if (BothNeeded) { | ||
| // Both subregisters need saving. | ||
| if (MFI.hasCalls()) { |
There was a problem hiding this comment.
I don`t fully understand why we have to do special handling with a call in the function, since we are working on callee saved registers. Would be nice to include a comment for future refernce.
There was a problem hiding this comment.
As discussed offline, when there is calls we mark the L register so that we get a single spill instead of 2. When there are no calls, we prefer marking the subregsiters since they can be copied to non CSR registers instead of spilled to memory (There is no move instruction between L registers). For the call case we have no choice but to spill anyway since we don't know which registers the callee is going to use. I'll add this as a comment.
63e04b5 to
51e5ec7
Compare
|
I addressed the comments and fixed up the test for the call case to reflect realistic MIR (not using CSR registers in the PseudoRET) |
l registers are only considered callee-saved by the prologepilog emitter via their subregisters, so when a greedy allocates an l register that is callee-saved, the prologeemitter will save each of its gpr subregisters instead of saving the entire l directly. This adds those l registers to the csr_aie2ps list
51e5ec7 to
7feb5da
Compare
|
This looks good, thank you for fixing this issue! I I just left some comments. |
7feb5da to
8e0f01c
Compare
8e0f01c to
0923f24
Compare
niwinanto
left a comment
There was a problem hiding this comment.
LGTM! Thanks for addressing the comments.
0923f24 to
220b48e
Compare
…r members is used
The test was using CSR registers for function returns which led to the reload happening before the PseudoRET where it would have to be afterwards for a normal use. We ended up returning the l register that was saved from the caller function instead of the one defined inside the function. This fixes the test to show realistic MIR where the register is used by a KILL instruction instead of the PseudoRET, now the reload happens after the KILL and right before the return restoring the CSR value coming from the caller.
220b48e to
cac0f5e
Compare

No description provided.