-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[TRI] Remove reserved registers in getRegPressureSetLimit #118787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/wangpc-pp/spr/main.tri-remove-reserved-registers-in-getregpressuresetlimit
Are you sure you want to change the base?
Conversation
Created using spr 1.3.6-beta.1
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-powerpc Author: Pengcheng Wang (wangpc-pp) ChangesThere are two
It seems that we shouldn't use However, there exists some passes that use it directly. For example, These two This change helps to reduce the number of spills/reloads as well. Here are the RISC-V's statistics of spills/reloads on llvm-test-suite
Patch is 163.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118787.diff 26 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/RegisterClassInfo.h b/llvm/include/llvm/CodeGen/RegisterClassInfo.h
index 800bebea0dddb0..417a1e40d02b95 100644
--- a/llvm/include/llvm/CodeGen/RegisterClassInfo.h
+++ b/llvm/include/llvm/CodeGen/RegisterClassInfo.h
@@ -141,16 +141,11 @@ class RegisterClassInfo {
}
/// Get the register unit limit for the given pressure set index.
- ///
- /// RegisterClassInfo adjusts this limit for reserved registers.
unsigned getRegPressureSetLimit(unsigned Idx) const {
if (!PSetLimits[Idx])
- PSetLimits[Idx] = computePSetLimit(Idx);
+ PSetLimits[Idx] = TRI->getRegPressureSetLimit(*MF, Idx);
return PSetLimits[Idx];
}
-
-protected:
- unsigned computePSetLimit(unsigned Idx) const;
};
} // end namespace llvm
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 292fa3c94969be..f7cd7cfe1aa15b 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -913,9 +913,14 @@ class TargetRegisterInfo : public MCRegisterInfo {
virtual const char *getRegPressureSetName(unsigned Idx) const = 0;
/// Get the register unit pressure limit for this dimension.
- /// This limit must be adjusted dynamically for reserved registers.
+ /// TargetRegisterInfo adjusts this limit for reserved registers.
virtual unsigned getRegPressureSetLimit(const MachineFunction &MF,
- unsigned Idx) const = 0;
+ unsigned Idx) const;
+
+ /// Get the raw register unit pressure limit for this dimension.
+ /// This limit must be adjusted dynamically for reserved registers.
+ virtual unsigned getRawRegPressureSetLimit(const MachineFunction &MF,
+ unsigned Idx) const = 0;
/// Get the dimensions of register pressure impacted by this register class.
/// Returns a -1 terminated array of pressure set IDs.
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 7a10bd39e2695d..3ee0ba1fea5079 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -1327,47 +1327,6 @@ class HighRegisterPressureDetector {
void computePressureSetLimit(const RegisterClassInfo &RCI) {
for (unsigned PSet = 0; PSet < PSetNum; PSet++)
PressureSetLimit[PSet] = TRI->getRegPressureSetLimit(MF, PSet);
-
- // We assume fixed registers, such as stack pointer, are already in use.
- // Therefore subtracting the weight of the fixed registers from the limit of
- // each pressure set in advance.
- SmallDenseSet<Register, 8> FixedRegs;
- for (const TargetRegisterClass *TRC : TRI->regclasses()) {
- for (const MCPhysReg Reg : *TRC)
- if (isFixedRegister(Reg))
- FixedRegs.insert(Reg);
- }
-
- LLVM_DEBUG({
- for (auto Reg : FixedRegs) {
- dbgs() << printReg(Reg, TRI, 0, &MRI) << ": [";
- for (MCRegUnit Unit : TRI->regunits(Reg)) {
- const int *Sets = TRI->getRegUnitPressureSets(Unit);
- for (; *Sets != -1; Sets++) {
- dbgs() << TRI->getRegPressureSetName(*Sets) << ", ";
- }
- }
- dbgs() << "]\n";
- }
- });
-
- for (auto Reg : FixedRegs) {
- LLVM_DEBUG(dbgs() << "fixed register: " << printReg(Reg, TRI, 0, &MRI)
- << "\n");
- for (MCRegUnit Unit : TRI->regunits(Reg)) {
- auto PSetIter = MRI.getPressureSets(Unit);
- unsigned Weight = PSetIter.getWeight();
- for (; PSetIter.isValid(); ++PSetIter) {
- unsigned &Limit = PressureSetLimit[*PSetIter];
- assert(
- Limit >= Weight &&
- "register pressure limit must be greater than or equal weight");
- Limit -= Weight;
- LLVM_DEBUG(dbgs() << "PSet=" << *PSetIter << " Limit=" << Limit
- << " (decreased by " << Weight << ")\n");
- }
- }
- }
}
// There are two patterns of last-use.
diff --git a/llvm/lib/CodeGen/RegisterClassInfo.cpp b/llvm/lib/CodeGen/RegisterClassInfo.cpp
index 9312bc03bc522a..976d41a54da56f 100644
--- a/llvm/lib/CodeGen/RegisterClassInfo.cpp
+++ b/llvm/lib/CodeGen/RegisterClassInfo.cpp
@@ -195,40 +195,3 @@ void RegisterClassInfo::compute(const TargetRegisterClass *RC) const {
// RCI is now up-to-date.
RCI.Tag = Tag;
}
-
-/// This is not accurate because two overlapping register sets may have some
-/// nonoverlapping reserved registers. However, computing the allocation order
-/// for all register classes would be too expensive.
-unsigned RegisterClassInfo::computePSetLimit(unsigned Idx) const {
- const TargetRegisterClass *RC = nullptr;
- unsigned NumRCUnits = 0;
- for (const TargetRegisterClass *C : TRI->regclasses()) {
- const int *PSetID = TRI->getRegClassPressureSets(C);
- for (; *PSetID != -1; ++PSetID) {
- if ((unsigned)*PSetID == Idx)
- break;
- }
- if (*PSetID == -1)
- continue;
-
- // Found a register class that counts against this pressure set.
- // For efficiency, only compute the set order for the largest set.
- unsigned NUnits = TRI->getRegClassWeight(C).WeightLimit;
- if (!RC || NUnits > NumRCUnits) {
- RC = C;
- NumRCUnits = NUnits;
- }
- }
- assert(RC && "Failed to find register class");
- compute(RC);
- unsigned NAllocatableRegs = getNumAllocatableRegs(RC);
- unsigned RegPressureSetLimit = TRI->getRegPressureSetLimit(*MF, Idx);
- // If all the regs are reserved, return raw RegPressureSetLimit.
- // One example is VRSAVERC in PowerPC.
- // Avoid returning zero, getRegPressureSetLimit(Idx) assumes computePSetLimit
- // return non-zero value.
- if (NAllocatableRegs == 0)
- return RegPressureSetLimit;
- unsigned NReserved = RC->getNumRegs() - NAllocatableRegs;
- return RegPressureSetLimit - TRI->getRegClassWeight(RC).RegWeight * NReserved;
-}
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index 032f1a33e75c43..4cede283a7232c 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -674,6 +674,50 @@ TargetRegisterInfo::prependOffsetExpression(const DIExpression *Expr,
PrependFlags & DIExpression::EntryValue);
}
+unsigned TargetRegisterInfo::getRegPressureSetLimit(const MachineFunction &MF,
+ unsigned Idx) const {
+ const TargetRegisterClass *RC = nullptr;
+ unsigned NumRCUnits = 0;
+ for (const TargetRegisterClass *C : regclasses()) {
+ const int *PSetID = getRegClassPressureSets(C);
+ for (; *PSetID != -1; ++PSetID) {
+ if ((unsigned)*PSetID == Idx)
+ break;
+ }
+ if (*PSetID == -1)
+ continue;
+
+ // Found a register class that counts against this pressure set.
+ // For efficiency, only compute the set order for the largest set.
+ unsigned NUnits = getRegClassWeight(C).WeightLimit;
+ if (!RC || NUnits > NumRCUnits) {
+ RC = C;
+ NumRCUnits = NUnits;
+ }
+ }
+ assert(RC && "Failed to find register class");
+
+ unsigned NReserved = 0;
+ const BitVector Reserved = MF.getRegInfo().getReservedRegs();
+ for (unsigned PhysReg : RC->getRawAllocationOrder(MF))
+ if (Reserved.test(PhysReg))
+ NReserved++;
+
+ unsigned NAllocatableRegs = RC->getNumRegs() - NReserved;
+ unsigned RegPressureSetLimit = getRawRegPressureSetLimit(MF, Idx);
+ // If all the regs are reserved, return raw RegPressureSetLimit.
+ // One example is VRSAVERC in PowerPC.
+ // Avoid returning zero, RegisterClassInfo::getRegPressureSetLimit(Idx)
+ // assumes this returns non-zero value.
+ if (NAllocatableRegs == 0) {
+ LLVM_DEBUG({
+ dbgs() << "All registers of " << getRegClassName(RC) << " are reserved!";
+ });
+ return RegPressureSetLimit;
+ }
+ return RegPressureSetLimit - getRegClassWeight(RC).RegWeight * NReserved;
+}
+
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD
void TargetRegisterInfo::dumpReg(Register Reg, unsigned SubRegIndex,
diff --git a/llvm/test/CodeGen/LoongArch/jr-without-ra.ll b/llvm/test/CodeGen/LoongArch/jr-without-ra.ll
index d1c4459aaa6ee0..2bd89dacb2b37a 100644
--- a/llvm/test/CodeGen/LoongArch/jr-without-ra.ll
+++ b/llvm/test/CodeGen/LoongArch/jr-without-ra.ll
@@ -20,101 +20,101 @@ define void @jr_without_ra(ptr %rtwdev, ptr %chan, ptr %h2c, i8 %.pre, i1 %cmp.i
; CHECK-NEXT: st.d $s6, $sp, 24 # 8-byte Folded Spill
; CHECK-NEXT: st.d $s7, $sp, 16 # 8-byte Folded Spill
; CHECK-NEXT: st.d $s8, $sp, 8 # 8-byte Folded Spill
-; CHECK-NEXT: move $s7, $zero
-; CHECK-NEXT: move $s0, $zero
+; CHECK-NEXT: move $s6, $zero
+; CHECK-NEXT: move $s1, $zero
; CHECK-NEXT: ld.d $t0, $sp, 184
-; CHECK-NEXT: ld.d $s2, $sp, 176
-; CHECK-NEXT: ld.d $s1, $sp, 168
-; CHECK-NEXT: ld.d $t1, $sp, 160
-; CHECK-NEXT: ld.d $t2, $sp, 152
-; CHECK-NEXT: ld.d $t3, $sp, 144
-; CHECK-NEXT: ld.d $t4, $sp, 136
-; CHECK-NEXT: ld.d $t5, $sp, 128
-; CHECK-NEXT: ld.d $t6, $sp, 120
-; CHECK-NEXT: ld.d $t7, $sp, 112
-; CHECK-NEXT: ld.d $t8, $sp, 104
-; CHECK-NEXT: ld.d $fp, $sp, 96
+; CHECK-NEXT: ld.d $t1, $sp, 176
+; CHECK-NEXT: ld.d $s2, $sp, 168
+; CHECK-NEXT: ld.d $t2, $sp, 160
+; CHECK-NEXT: ld.d $t3, $sp, 152
+; CHECK-NEXT: ld.d $t4, $sp, 144
+; CHECK-NEXT: ld.d $t5, $sp, 136
+; CHECK-NEXT: ld.d $t6, $sp, 128
+; CHECK-NEXT: ld.d $t7, $sp, 120
+; CHECK-NEXT: ld.d $t8, $sp, 112
+; CHECK-NEXT: ld.d $fp, $sp, 104
+; CHECK-NEXT: ld.d $s0, $sp, 96
; CHECK-NEXT: andi $a4, $a4, 1
-; CHECK-NEXT: alsl.d $a6, $a6, $s1, 4
-; CHECK-NEXT: pcalau12i $s1, %pc_hi20(.LJTI0_0)
-; CHECK-NEXT: addi.d $s1, $s1, %pc_lo12(.LJTI0_0)
-; CHECK-NEXT: slli.d $s3, $s2, 2
-; CHECK-NEXT: alsl.d $s2, $s2, $s3, 1
-; CHECK-NEXT: add.d $s2, $t5, $s2
-; CHECK-NEXT: addi.w $s4, $zero, -41
+; CHECK-NEXT: alsl.d $a6, $a6, $s2, 4
+; CHECK-NEXT: pcalau12i $s2, %pc_hi20(.LJTI0_0)
+; CHECK-NEXT: addi.d $s2, $s2, %pc_lo12(.LJTI0_0)
; CHECK-NEXT: ori $s3, $zero, 1
-; CHECK-NEXT: slli.d $s4, $s4, 3
-; CHECK-NEXT: ori $s6, $zero, 3
-; CHECK-NEXT: lu32i.d $s6, 262144
+; CHECK-NEXT: ori $s4, $zero, 50
+; CHECK-NEXT: ori $s5, $zero, 3
+; CHECK-NEXT: lu32i.d $s5, 262144
; CHECK-NEXT: b .LBB0_4
; CHECK-NEXT: .p2align 4, , 16
; CHECK-NEXT: .LBB0_1: # %sw.bb27.i.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: ori $s8, $zero, 1
+; CHECK-NEXT: ori $s7, $zero, 1
; CHECK-NEXT: .LBB0_2: # %if.else.i106
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: alsl.d $s5, $s0, $s0, 3
-; CHECK-NEXT: alsl.d $s0, $s5, $s0, 1
-; CHECK-NEXT: add.d $s0, $t0, $s0
-; CHECK-NEXT: ldx.bu $s8, $s0, $s8
+; CHECK-NEXT: alsl.d $s8, $s1, $s1, 3
+; CHECK-NEXT: alsl.d $s1, $s8, $s1, 1
+; CHECK-NEXT: add.d $s1, $t0, $s1
+; CHECK-NEXT: ldx.bu $s7, $s1, $s7
; CHECK-NEXT: .LBB0_3: # %phy_tssi_get_ofdm_de.exit
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: st.b $zero, $t5, 0
-; CHECK-NEXT: st.b $s7, $t3, 0
-; CHECK-NEXT: st.b $zero, $t8, 0
-; CHECK-NEXT: st.b $zero, $t1, 0
-; CHECK-NEXT: st.b $zero, $a1, 0
+; CHECK-NEXT: st.b $zero, $t6, 0
+; CHECK-NEXT: st.b $s6, $t4, 0
+; CHECK-NEXT: st.b $zero, $fp, 0
; CHECK-NEXT: st.b $zero, $t2, 0
-; CHECK-NEXT: st.b $s8, $a5, 0
-; CHECK-NEXT: ori $s0, $zero, 1
-; CHECK-NEXT: move $s7, $a3
+; CHECK-NEXT: st.b $zero, $a1, 0
+; CHECK-NEXT: st.b $zero, $t3, 0
+; CHECK-NEXT: st.b $s7, $a5, 0
+; CHECK-NEXT: ori $s1, $zero, 1
+; CHECK-NEXT: move $s6, $a3
; CHECK-NEXT: .LBB0_4: # %for.body
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
; CHECK-NEXT: beqz $a4, .LBB0_9
; CHECK-NEXT: # %bb.5: # %calc_6g.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: move $s7, $zero
+; CHECK-NEXT: move $s6, $zero
; CHECK-NEXT: bnez $zero, .LBB0_8
; CHECK-NEXT: # %bb.6: # %calc_6g.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: slli.d $s8, $zero, 3
-; CHECK-NEXT: ldx.d $s8, $s8, $s1
-; CHECK-NEXT: jr $s8
+; CHECK-NEXT: slli.d $s7, $zero, 3
+; CHECK-NEXT: ldx.d $s7, $s7, $s2
+; CHECK-NEXT: jr $s7
; CHECK-NEXT: .LBB0_7: # %sw.bb12.i.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: ori $s7, $zero, 1
+; CHECK-NEXT: ori $s6, $zero, 1
; CHECK-NEXT: .LBB0_8: # %if.else58.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: ldx.bu $s7, $a6, $s7
+; CHECK-NEXT: ldx.bu $s6, $a6, $s6
; CHECK-NEXT: b .LBB0_11
; CHECK-NEXT: .p2align 4, , 16
; CHECK-NEXT: .LBB0_9: # %if.end.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: andi $s7, $s7, 255
-; CHECK-NEXT: ori $s5, $zero, 50
-; CHECK-NEXT: bltu $s5, $s7, .LBB0_15
+; CHECK-NEXT: andi $s6, $s6, 255
+; CHECK-NEXT: bltu $s4, $s6, .LBB0_15
; CHECK-NEXT: # %bb.10: # %if.end.i
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: sll.d $s7, $s3, $s7
-; CHECK-NEXT: and $s8, $s7, $s6
-; CHECK-NEXT: move $s7, $fp
-; CHECK-NEXT: beqz $s8, .LBB0_15
+; CHECK-NEXT: sll.d $s6, $s3, $s6
+; CHECK-NEXT: and $s7, $s6, $s5
+; CHECK-NEXT: move $s6, $s0
+; CHECK-NEXT: beqz $s7, .LBB0_15
; CHECK-NEXT: .LBB0_11: # %phy_tssi_get_ofdm_trim_de.exit
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
-; CHECK-NEXT: move $s8, $zero
-; CHECK-NEXT: st.b $zero, $t7, 0
-; CHECK-NEXT: ldx.b $ra, $s2, $t4
+; CHECK-NEXT: move $s7, $zero
+; CHECK-NEXT: st.b $zero, $t8, 0
+; CHECK-NEXT: slli.d $s8, $t1, 2
+; CHECK-NEXT: alsl.d $s8, $t1, $s8, 1
+; CHECK-NEXT: add.d $s8, $t6, $s8
+; CHECK-NEXT: ldx.b $s8, $s8, $t5
; CHECK-NEXT: st.b $zero, $a2, 0
; CHECK-NEXT: st.b $zero, $a7, 0
-; CHECK-NEXT: st.b $zero, $t6, 0
-; CHECK-NEXT: st.b $ra, $a0, 0
+; CHECK-NEXT: st.b $zero, $t7, 0
+; CHECK-NEXT: st.b $s8, $a0, 0
; CHECK-NEXT: bnez $s3, .LBB0_13
; CHECK-NEXT: # %bb.12: # %phy_tssi_get_ofdm_trim_de.exit
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
+; CHECK-NEXT: addi.w $s8, $zero, -41
+; CHECK-NEXT: slli.d $s8, $s8, 3
; CHECK-NEXT: pcalau12i $ra, %pc_hi20(.LJTI0_1)
; CHECK-NEXT: addi.d $ra, $ra, %pc_lo12(.LJTI0_1)
-; CHECK-NEXT: ldx.d $s5, $s4, $ra
-; CHECK-NEXT: jr $s5
+; CHECK-NEXT: ldx.d $s8, $s8, $ra
+; CHECK-NEXT: jr $s8
; CHECK-NEXT: .LBB0_13: # %phy_tssi_get_ofdm_trim_de.exit
; CHECK-NEXT: # in Loop: Header=BB0_4 Depth=1
; CHECK-NEXT: bnez $s3, .LBB0_1
diff --git a/llvm/test/CodeGen/NVPTX/misched_func_call.ll b/llvm/test/CodeGen/NVPTX/misched_func_call.ll
index e036753ce90306..ee6b5869111c6f 100644
--- a/llvm/test/CodeGen/NVPTX/misched_func_call.ll
+++ b/llvm/test/CodeGen/NVPTX/misched_func_call.ll
@@ -17,7 +17,6 @@ define ptx_kernel void @wombat(i32 %arg, i32 %arg1, i32 %arg2) {
; CHECK-NEXT: ld.param.u32 %r2, [wombat_param_0];
; CHECK-NEXT: mov.b32 %r10, 0;
; CHECK-NEXT: mov.u64 %rd1, 0;
-; CHECK-NEXT: mov.b32 %r6, 1;
; CHECK-NEXT: $L__BB0_1: // %bb3
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: { // callseq 0, 0
@@ -29,16 +28,16 @@ define ptx_kernel void @wombat(i32 %arg, i32 %arg1, i32 %arg2) {
; CHECK-NEXT: (
; CHECK-NEXT: param0
; CHECK-NEXT: );
+; CHECK-NEXT: ld.param.f64 %fd1, [retval0];
+; CHECK-NEXT: } // callseq 0
; CHECK-NEXT: mul.lo.s32 %r7, %r10, %r3;
; CHECK-NEXT: or.b32 %r8, %r4, %r7;
; CHECK-NEXT: mul.lo.s32 %r9, %r2, %r8;
; CHECK-NEXT: cvt.rn.f64.s32 %fd3, %r9;
-; CHECK-NEXT: ld.param.f64 %fd1, [retval0];
-; CHECK-NEXT: } // callseq 0
; CHECK-NEXT: cvt.rn.f64.u32 %fd4, %r10;
; CHECK-NEXT: add.rn.f64 %fd5, %fd4, %fd3;
; CHECK-NEXT: st.global.f64 [%rd1], %fd5;
-; CHECK-NEXT: mov.u32 %r10, %r6;
+; CHECK-NEXT: mov.b32 %r10, 1;
; CHECK-NEXT: bra.uni $L__BB0_1;
bb:
br label %bb3
diff --git a/llvm/test/CodeGen/PowerPC/aix-csr-alloc.mir b/llvm/test/CodeGen/PowerPC/aix-csr-alloc.mir
index fba410dc0dafce..7c8a5848b402f4 100644
--- a/llvm/test/CodeGen/PowerPC/aix-csr-alloc.mir
+++ b/llvm/test/CodeGen/PowerPC/aix-csr-alloc.mir
@@ -17,5 +17,4 @@ body: |
...
# CHECK-DAG: AllocationOrder(GPRC) = [ $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r0 $r31 $r30 $r29 $r28 $r27 $r26 $r25 $r24 $r23 $r22 $r21 $r20 $r19 $r18 $r17 $r16 $r15 $r14 $r13 ]
-# CHECK-DAG: AllocationOrder(F4RC) = [ $f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f31 $f30 $f29 $f28 $f27 $f26 $f25 $f24 $f23 $f22 $f21 $f20 $f19 $f18 $f17 $f16 $f15 $f14 ]
# CHECK-DAG: AllocationOrder(GPRC_and_GPRC_NOR0) = [ $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r31 $r30 $r29 $r28 $r27 $r26 $r25 $r24 $r23 $r22 $r21 $r20 $r19 $r18 $r17 $r16 $r15 $r14 $r13 ]
diff --git a/llvm/test/CodeGen/PowerPC/aix64-csr-alloc.mir b/llvm/test/CodeGen/PowerPC/aix64-csr-alloc.mir
index 584b6b0ad46dd9..3617b95b2a6af7 100644
--- a/llvm/test/CodeGen/PowerPC/aix64-csr-alloc.mir
+++ b/llvm/test/CodeGen/PowerPC/aix64-csr-alloc.mir
@@ -16,6 +16,5 @@ body: |
$f1 = COPY %2
BLR8 implicit $lr8, implicit undef $rm, implicit $x3, implicit $f1
...
-# CHECK-DAG: AllocationOrder(VFRC) = [ $vf2 $vf3 $vf4 $vf5 $vf0 $vf1 $vf6 $vf7 $vf8 $vf9 $vf10 $vf11 $vf12 $vf13 $vf14 $vf15 $vf16 $vf17 $vf18 $vf19 $vf31 $vf30 $vf29 $vf28 $vf27 $vf26 $vf25 $vf24 $vf23 $vf22 $vf21 $vf20 ]
# CHECK-DAG: AllocationOrder(G8RC_and_G8RC_NOX0) = [ $x3 $x4 $x5 $x6 $x7 $x8 $x9 $x10 $x11 $x12 $x2 $x31 $x30 $x29 $x28 $x27 $x26 $x25 $x24 $x23 $x22 $x21 $x20 $x19 $x18 $x17 $x16 $x15 $x14 ]
# CHECK-DAG: AllocationOrder(F8RC) = [ $f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f31 $f30 $f29 $f28 $f27 $f26 $f25 $f24 $f23 $f22 $f21 $f20 $f19 $f18 $f17 $f16 $f15 $f14 ]
diff --git a/llvm/test/CodeGen/PowerPC/compute-regpressure.ll b/llvm/test/CodeGen/PowerPC/compute-regpressure.ll
index 9a1b057c2e38d4..9d893b8dbebee2 100644
--- a/llvm/test/CodeGen/PowerPC/compute-regpressure.ll
+++ b/llvm/test/CodeGen/PowerPC/compute-regpressure.ll
@@ -1,7 +1,7 @@
; REQUIRES: asserts
-; RUN: llc -debug-only=regalloc < %s 2>&1 |FileCheck %s --check-prefix=DEBUG
+; RUN: llc -debug-only=target-reg-info < %s 2>&1 |FileCheck %s --check-prefix=DEBUG
-; DEBUG-COUNT-1: AllocationOrder(VRSAVERC) = [ ]
+; DEBUG-COUNT-1: All registers of VRSAVERC are reserved!
target triple = "powerpc64le-unknown-linux-gnu"
diff --git a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
index c35f05be304cce..ec2448cb3965f3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
@@ -489,8 +489,9 @@ define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_
; RV64-NEXT: j .LBB0_11
; RV64-NEXT: .LBB0_8: # %vector.ph
; RV64-NEXT: # in Loop: Header=BB0_6 Depth=1
-; RV64-NEXT: slli t6, t0, 28
-; RV64-NEXT: sub t6, t6, t1
+; RV64-NEXT: slli t6, t0, 1
+; RV64-NEXT: slli s0, t0, 28
+; RV64-NEXT: sub t6, s0, t6
; RV64-NEXT: and t6, t6, a6
; RV64-NEXT: csrwi vxrm, 0
; RV64-NEXT: mv s0, a2
diff --git a/llvm/test/CodeGen/Thumb2/mve-blockplacement.ll b/llvm/test/CodeGen/Thumb2/mve-blockplacement.ll
index 7087041e8dace6..6d082802f9cd75 100644
--- a/llvm/test/CodeGen/Thumb2/mve-blockplacement.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-blockplacement.ll
@@ -353,8 +353,8 @@ define i32 @d(i64 %e, i32 %f, i64 %g, i32 %h) {
; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
; CHECK-NEXT: .pad #4
; CHECK-NEXT: sub sp, #4
-; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
-; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
+; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}
+; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: .pad #16
; CHECK-NEXT: sub sp, #16
; CHECK-NEXT: mov lr, r0
@@ -364,50 +364,48 @@ define i32 @d(i64 %e, i32 %f, i64 %g, i32 %h) {
; CHECK-NEXT: @ %bb.1: @ %for.cond2.preheader.lr.ph
; CHECK-NEXT: movs r0, #1
; CHECK-NEXT: cmp r2, #1
-; CHECK-NEXT: csel r7, r2, r0, lt
+; CHECK-NEXT: csel r3, r2, r0, lt
; CHECK-NEXT: mov r12, r1
-; CHECK-NEXT: mov r1, r7
-; CHECK-NEXT: cmp r7, #3
+; CHECK-NEXT: m...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right. Two small changes below
Do you know why this doesn't have as many changes to the RISC-V tests as the RISCV-V specific patch?
I did a rough investigation, I think it is because we only changed the limit of |
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
…egPressureSetLimit directly Created using spr 1.3.6-beta.1
|
||
unsigned NReserved = 0; | ||
const BitVector Reserved = MF.getRegInfo().getReservedRegs(); | ||
for (MCPhysReg PhysReg : RC->getRawAllocationOrder(MF)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still the pre-existing bug where the pressure number is wrong for overlapping registers in the allocation order, and should probably be reporting number of distinct allocatable registers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already had a WIP patch to generate the mapping from pressure set to RegisterClass, I wil post it when it's ready.
} | ||
} | ||
assert(RC && "Failed to find register class"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we drop the call to compute
from computePSetLimit here? It looks like we're doing this instead:
unsigned NReserved = 0;
const BitVector Reserved = MF.getRegInfo().getReservedRegs();
for (MCPhysReg PhysReg : RC->getRawAllocationOrder(MF))
if (Reserved.test(PhysReg))
NReserved++;
It looks like this is related to what is in compute
, but we seem to drop some costing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we need a compute
is because we need to compute the number of allocatable registers and then get the number of reserved registers (but actually I think we don't need to call compute
explicitly as getNumAllocatableRegs
is cached, it will call compute
in get(RC)
):
// ...
compute(RC);
unsigned NAllocatableRegs = getNumAllocatableRegs(RC);
// ...
unsigned NReserved = RC->getNumRegs() - NAllocatableRegs;
unsigned getNumAllocatableRegs(const TargetRegisterClass *RC) const {
return get(RC).NumRegs;
}
The logic of calculating NumRegs
in compute
is:
// FIXME: Once targets reserve registers instead of removing them from the
// allocation order, we can simply use begin/end here.
ArrayRef<MCPhysReg> RawOrder = RC->getRawAllocationOrder(*MF);
for (unsigned PhysReg : RawOrder) {
// Remove reserved registers from the allocation order.
if (Reserved.test(PhysReg))
continue;
uint8_t Cost = RegCosts[PhysReg];
MinCost = std::min(MinCost, Cost);
if (getLastCalleeSavedAlias(PhysReg) &&
!STI.ignoreCSRForAllocationOrder(*MF, PhysReg))
// PhysReg aliases a CSR, save it for later.
CSRAlias.push_back(PhysReg);
else {
if (Cost != LastCost)
LastCostChange = N;
RCI.Order[N++] = PhysReg;
LastCost = Cost;
}
}
RCI.NumRegs = N + CSRAlias.size();
RCI.NumRegs
is the total number of registers minus the number of reserved registers.
And in this patch, we change it to calculate the number of reserved registers first and calculate the allocatable registers later. The effect should be the same I think.
@@ -1327,47 +1327,6 @@ class HighRegisterPressureDetector { | |||
void computePressureSetLimit(const RegisterClassInfo &RCI) { | |||
for (unsigned PSet = 0; PSet < PSetNum; PSet++) | |||
PressureSetLimit[PSet] = TRI->getRegPressureSetLimit(MF, PSet); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kasuga-fj! What do you think about this part? I just removed the code below as it seems to be unnecessary now. Related patch: #74807
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think just removing fixed
registers is not enough here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for caring. I dared to replace it from RegisterClassInfo::getRegPressureSetLimit
before (in #87312). This is because there were duplicate registers between what RegisterClassInfo::getRegPressureSetLimit
removes and the fixed
registers mentioned here. If RegisterClassInfo::getRegPressureSetLimit
now takes care of both, I don't think it's a problem to remove this part as you did.
I need more eyes on test diffs of other targets besides RISC-V! Please help to identify if there are some regressions! |
Is there a compile time impact for this patch? |
This should increase some compile-time, but I don't know if it is significant. The limits are cached in |
The limits aren’t cached for passes like MachineLICM right? And it will be recomputed for each function? My understanding of RegisterClassInfo is that it maintains the cache across functions as long as they have the same the subtarget or something like that? |
Yes, you are right! I will force these passes to use |
Can't the pass just be a wrapper around RegClassInfo? Why do we need to remove the ability to use RegClassInfo as a utility? |
Aha! I never thought about that! Good idea! |
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm#118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm#118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from #118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from #118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm#118787
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from #118787
Ping. |
Description needs to be updated if MachineLICM, MachineSink, MachinePipeliner have been migrated to RegisterClassInfo. |
llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
Outdated
Show resolved
Hide resolved
Do you know what caused the X86 changes? I don't see any uses of getRegPressureSetLimit in the X86 directory. |
Just checked line by line, I have no idea why X86 has some changes... |
The reason may be mentally absorbing (and costed me a lot of time on debugging...): For some |
Created using spr 1.3.6-beta.1
# CHECK-DAG: AllocationOrder(G8RC_and_G8RC_NOX0) = [ $x3 $x4 $x5 $x6 $x7 $x8 $x9 $x10 $x11 $x12 $x2 $x31 $x30 $x29 $x28 $x27 $x26 $x25 $x24 $x23 $x22 $x21 $x20 $x19 $x18 $x17 $x16 $x15 $x14 ] | ||
# CHECK-DAG: AllocationOrder(G8RC) = [ $x3 $x4 $x5 $x6 $x7 $x8 $x9 $x10 $x11 $x12 $x0 $x2 $x31 $x30 $x29 $x28 $x27 $x26 $x25 $x24 $x23 $x22 $x21 $x20 $x19 $x18 $x17 $x16 $x15 $x14 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these PPC changes, it is just because we have different code path now and the dumps are different.
…it` (#119830) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm/llvm-project#118787
…etLimit` (#119827) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Thus we should use `RegisterClassInfo::getRegPressureSetLimit` and remove replicated code. Separate from llvm/llvm-project#118787
…0377) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm/llvm-project#118787
…(#120383) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm/llvm-project#118787
…it` (#119826) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm/llvm-project#118787
There are two
getRegPressureSetLimit
:RegisterClassInfo::getRegPressureSetLimit
.TargetRegisterInfo::getRegPressureSetLimit
.RegisterClassInfo::getRegPressureSetLimit
is a wrapper ofTargetRegisterInfo::getRegPressureSetLimit
with some logics toadjust the limit by removing reserved registers.
It seems that we shouldn't use
TargetRegisterInfo::getRegPressureSetLimit
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
These two
getRegPressureSetLimit
s are messy, and easy to confusethe users. So here we move the logic of adjusting these limits for
reserved registers in
RegisterClassInfo::getRegPressureSetLimit
to
TargetRegisterInfo::getRegPressureSetLimit
. This makes the previousone a thin cached wrapper of the later one.