Skip to content

[ARM] Stop gluing ALU nodes to branches / selects #116970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 30, 2024

Conversation

s-barannikov
Copy link
Contributor

Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.

@llvmbot
Copy link
Member

llvmbot commented Nov 20, 2024

@llvm/pr-subscribers-backend-arm

Author: Sergei Barannikov (s-barannikov)

Changes

Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.


Patch is 1.12 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116970.diff

82 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp (+21-28)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+64-126)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.h (+1-3)
  • (modified) llvm/lib/Target/ARM/ARMInstrFormats.td (+3-10)
  • (modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+76-57)
  • (modified) llvm/lib/Target/ARM/ARMInstrThumb.td (+11-9)
  • (modified) llvm/lib/Target/ARM/ARMInstrThumb2.td (+78-59)
  • (modified) llvm/lib/Target/ARM/ARMInstrVFP.td (+26-17)
  • (modified) llvm/test/CodeGen/ARM/add-like-or.ll (+5-5)
  • (modified) llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll (+51-43)
  • (modified) llvm/test/CodeGen/ARM/atomic-64bit.ll (+20-40)
  • (modified) llvm/test/CodeGen/ARM/atomic-ops-v8.ll (+8-20)
  • (modified) llvm/test/CodeGen/ARM/atomicrmw-cond-sub-clamp.ll (+12-16)
  • (modified) llvm/test/CodeGen/ARM/atomicrmw-uinc-udec-wrap.ll (+15-19)
  • (modified) llvm/test/CodeGen/ARM/atomicrmw_exclusive_monitor_ints.ll (+56-68)
  • (modified) llvm/test/CodeGen/ARM/bfi.ll (+5-4)
  • (modified) llvm/test/CodeGen/ARM/cmov_fp16.ll (+54-54)
  • (modified) llvm/test/CodeGen/ARM/cse-call.ll (+3-1)
  • (modified) llvm/test/CodeGen/ARM/cttz.ll (+98-106)
  • (modified) llvm/test/CodeGen/ARM/fadd-select-fneg-combine.ll (+10-10)
  • (modified) llvm/test/CodeGen/ARM/fcmp-xo.ll (+10-10)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat.ll (+694-1050)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat_vec.ll (+1564-1640)
  • (modified) llvm/test/CodeGen/ARM/fptoi-sat-store.ll (+77-80)
  • (modified) llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll (+797-863)
  • (modified) llvm/test/CodeGen/ARM/fptoui-sat-scalar.ll (+695-781)
  • (modified) llvm/test/CodeGen/ARM/funnel-shift-rot.ll (+4-4)
  • (modified) llvm/test/CodeGen/ARM/funnel-shift.ll (+70-74)
  • (modified) llvm/test/CodeGen/ARM/ifcvt1.ll (+3-3)
  • (modified) llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll (+34-34)
  • (modified) llvm/test/CodeGen/ARM/neon_vabd.ll (+20-20)
  • (modified) llvm/test/CodeGen/ARM/overflow-intrinsic-optimizations.ll (+3-3)
  • (modified) llvm/test/CodeGen/ARM/sadd_sat.ll (+21-23)
  • (modified) llvm/test/CodeGen/ARM/sadd_sat_plus.ll (+11-11)
  • (modified) llvm/test/CodeGen/ARM/select.ll (+24-24)
  • (modified) llvm/test/CodeGen/ARM/select_const.ll (+7-5)
  • (modified) llvm/test/CodeGen/ARM/shift-i64.ll (+7-7)
  • (modified) llvm/test/CodeGen/ARM/ssub_sat.ll (+10-11)
  • (modified) llvm/test/CodeGen/ARM/ssub_sat_plus.ll (+11-11)
  • (modified) llvm/test/CodeGen/ARM/sub-cmp-peephole.ll (+1-1)
  • (modified) llvm/test/CodeGen/ARM/uadd_sat.ll (+3-5)
  • (modified) llvm/test/CodeGen/ARM/uadd_sat_plus.ll (+11-13)
  • (modified) llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll (+162-160)
  • (modified) llvm/test/CodeGen/ARM/umulo-64-legalisation-lowering.ll (+26-27)
  • (modified) llvm/test/CodeGen/ARM/usub_sat.ll (+3-4)
  • (modified) llvm/test/CodeGen/ARM/usub_sat_plus.ll (+3-4)
  • (modified) llvm/test/CodeGen/ARM/vselect_imax.ll (+177-181)
  • (modified) llvm/test/CodeGen/ARM/wide-compares.ll (+29-56)
  • (modified) llvm/test/CodeGen/Thumb/arm_q15_to_q31.ll (+1-2)
  • (modified) llvm/test/CodeGen/Thumb/select.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb/smul_fix_sat.ll (+198-228)
  • (modified) llvm/test/CodeGen/Thumb/stack-guard-xo.ll (+175-75)
  • (modified) llvm/test/CodeGen/Thumb/umul_fix_sat.ll (+82-79)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/arm_cmplx_dot_prod_f32.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/exitcount.ll (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll (+5-5)
  • (modified) llvm/test/CodeGen/Thumb2/float-ops.ll (+5-5)
  • (modified) llvm/test/CodeGen/Thumb2/mve-blockplacement.ll (+4-4)
  • (modified) llvm/test/CodeGen/Thumb2/mve-doublereduct.ll (+24-24)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float16regloops.ll (+1-1)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float32regloops.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fmas.ll (+58-58)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll (+323-381)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptosi-sat-vector.ll (+1748-2041)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptoui-sat-vector.ll (+1350-1603)
  • (modified) llvm/test/CodeGen/Thumb2/mve-gather-scatter-ptr-address.ll (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/mve-laneinterleaving.ll (+47-46)
  • (modified) llvm/test/CodeGen/Thumb2/mve-minmaxi.ll (+24-40)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll (+1-1)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-ext.ll (+151-143)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-or.ll (+8-10)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-vctpvpsel.ll (+8-8)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-vselect.ll (+53-57)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pred-xor.ll (+10-12)
  • (modified) llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll (+59-64)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vcmpf.ll (+78-78)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vcmpfr.ll (+156-156)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vcmpfz.ll (+156-156)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vmaxv-vminv-scalar.ll (+16-32)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vqdmulh.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/umulo-128-legalisation-lowering.ll (+94-96)
  • (modified) llvm/test/CodeGen/Thumb2/umulo-64-legalisation-lowering.ll (+12-13)
diff --git a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
index 73ee8cf81adcd60..3306573236266e0 100644
--- a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
+++ b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
@@ -4123,17 +4123,15 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
     SDValue Chain = N->getOperand(0);
     SDValue N1 = N->getOperand(1);
     SDValue N2 = N->getOperand(2);
-    SDValue N3 = N->getOperand(3);
-    SDValue InGlue = N->getOperand(4);
+    SDValue Flags = N->getOperand(3);
     assert(N1.getOpcode() == ISD::BasicBlock);
     assert(N2.getOpcode() == ISD::Constant);
-    assert(N3.getOpcode() == ISD::Register);
 
     unsigned CC = (unsigned)N2->getAsZExtVal();
 
-    if (InGlue.getOpcode() == ARMISD::CMPZ) {
-      if (InGlue.getOperand(0).getOpcode() == ISD::INTRINSIC_W_CHAIN) {
-        SDValue Int = InGlue.getOperand(0);
+    if (Flags.getOpcode() == ARMISD::CMPZ) {
+      if (Flags.getOperand(0).getOpcode() == ISD::INTRINSIC_W_CHAIN) {
+        SDValue Int = Flags.getOperand(0);
         uint64_t ID = Int->getConstantOperandVal(1);
 
         // Handle low-overhead loops.
@@ -4155,15 +4153,15 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
 
           ReplaceUses(N, LoopEnd);
           CurDAG->RemoveDeadNode(N);
-          CurDAG->RemoveDeadNode(InGlue.getNode());
+          CurDAG->RemoveDeadNode(Flags.getNode());
           CurDAG->RemoveDeadNode(Int.getNode());
           return;
         }
       }
 
       bool SwitchEQNEToPLMI;
-      SelectCMPZ(InGlue.getNode(), SwitchEQNEToPLMI);
-      InGlue = N->getOperand(4);
+      SelectCMPZ(Flags.getNode(), SwitchEQNEToPLMI);
+      Flags = N->getOperand(3);
 
       if (SwitchEQNEToPLMI) {
         switch ((ARMCC::CondCodes)CC) {
@@ -4178,26 +4176,22 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
       }
     }
 
+    SDValue InChain =
+        CurDAG->getCopyToReg(Chain, dl, ARM::CPSR, Flags, SDValue());
+    SDValue InGlue = InChain.getValue(1);
+
     SDValue Tmp2 = CurDAG->getTargetConstant(CC, dl, MVT::i32);
-    SDValue Ops[] = { N1, Tmp2, N3, Chain, InGlue };
-    SDNode *ResNode = CurDAG->getMachineNode(Opc, dl, MVT::Other,
-                                             MVT::Glue, Ops);
-    Chain = SDValue(ResNode, 0);
-    if (N->getNumValues() == 2) {
-      InGlue = SDValue(ResNode, 1);
-      ReplaceUses(SDValue(N, 1), InGlue);
-    }
-    ReplaceUses(SDValue(N, 0),
-                SDValue(Chain.getNode(), Chain.getResNo()));
-    CurDAG->RemoveDeadNode(N);
+    SDValue Ops[] = {N1, Tmp2, CurDAG->getRegister(ARM::CPSR, MVT::i32),
+                     InChain, InGlue};
+    CurDAG->SelectNodeTo(N, Opc, MVT::Other, Ops);
     return;
   }
 
   case ARMISD::CMPZ: {
     // select (CMPZ X, #-C) -> (CMPZ (ADDS X, #C), #0)
     //   This allows us to avoid materializing the expensive negative constant.
-    //   The CMPZ #0 is useless and will be peepholed away but we need to keep it
-    //   for its glue output.
+    //   The CMPZ #0 is useless and will be peepholed away but we need to keep
+    //   it for its flags output.
     SDValue X = N->getOperand(0);
     auto *C = dyn_cast<ConstantSDNode>(N->getOperand(1).getNode());
     if (C && C->getSExtValue() < 0 && Subtarget->isThumb()) {
@@ -4224,7 +4218,7 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
       }
       if (Add) {
         SDValue Ops2[] = {SDValue(Add, 0), CurDAG->getConstant(0, dl, MVT::i32)};
-        CurDAG->MorphNodeTo(N, ARMISD::CMPZ, CurDAG->getVTList(MVT::Glue), Ops2);
+        CurDAG->MorphNodeTo(N, ARMISD::CMPZ, N->getVTList(), Ops2);
       }
     }
     // Other cases are autogenerated.
@@ -4232,11 +4226,11 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
   }
 
   case ARMISD::CMOV: {
-    SDValue InGlue = N->getOperand(4);
+    SDValue Flags = N->getOperand(3);
 
-    if (InGlue.getOpcode() == ARMISD::CMPZ) {
+    if (Flags.getOpcode() == ARMISD::CMPZ) {
       bool SwitchEQNEToPLMI;
-      SelectCMPZ(InGlue.getNode(), SwitchEQNEToPLMI);
+      SelectCMPZ(Flags.getNode(), SwitchEQNEToPLMI);
 
       if (SwitchEQNEToPLMI) {
         SDValue ARMcc = N->getOperand(2);
@@ -4253,10 +4247,9 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
         }
         SDValue NewARMcc = CurDAG->getConstant((unsigned)CC, dl, MVT::i32);
         SDValue Ops[] = {N->getOperand(0), N->getOperand(1), NewARMcc,
-                         N->getOperand(3), N->getOperand(4)};
+                         N->getOperand(3)};
         CurDAG->MorphNodeTo(N, ARMISD::CMOV, N->getVTList(), Ops);
       }
-
     }
     // Other cases are autogenerated.
     break;
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 6b290135c5bcbac..1b83902da006ef9 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -4924,14 +4924,11 @@ SDValue ARMTargetLowering::getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
       CC == ISD::SETUGT && isa<ConstantSDNode>(LHS.getOperand(1)) &&
       LHS.getConstantOperandVal(1) < 31) {
     unsigned ShiftAmt = LHS.getConstantOperandVal(1) + 1;
-    SDValue Shift = DAG.getNode(ARMISD::LSLS, dl,
-                                DAG.getVTList(MVT::i32, MVT::i32),
-                                LHS.getOperand(0),
-                                DAG.getConstant(ShiftAmt, dl, MVT::i32));
-    SDValue Chain = DAG.getCopyToReg(DAG.getEntryNode(), dl, ARM::CPSR,
-                                     Shift.getValue(1), SDValue());
+    SDValue Shift =
+        DAG.getNode(ARMISD::LSLS, dl, DAG.getVTList(MVT::i32, FlagsVT),
+                    LHS.getOperand(0), DAG.getConstant(ShiftAmt, dl, MVT::i32));
     ARMcc = DAG.getConstant(ARMCC::HI, dl, MVT::i32);
-    return Chain.getValue(1);
+    return Shift.getValue(1);
   }
 
   ARMCC::CondCodes CondCode = IntCCToARMCC(CC);
@@ -4963,7 +4960,7 @@ SDValue ARMTargetLowering::getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
     break;
   }
   ARMcc = DAG.getConstant(CondCode, dl, MVT::i32);
-  return DAG.getNode(CompareType, dl, MVT::Glue, LHS, RHS);
+  return DAG.getNode(CompareType, dl, FlagsVT, LHS, RHS);
 }
 
 /// Returns a appropriate VFP CMP (fcmp{s|d}+fmstat) for the given operands.
@@ -4978,24 +4975,7 @@ SDValue ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS,
   else
     Flags = DAG.getNode(Signaling ? ARMISD::CMPFPEw0 : ARMISD::CMPFPw0, dl,
                         FlagsVT, LHS);
-  return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Flags);
-}
-
-/// duplicateCmp - Glue values can have only one use, so this function
-/// duplicates a comparison node.
-SDValue
-ARMTargetLowering::duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const {
-  unsigned Opc = Cmp.getOpcode();
-  SDLoc DL(Cmp);
-  if (Opc == ARMISD::CMP || Opc == ARMISD::CMPZ)
-    return DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1));
-
-  assert(Opc == ARMISD::FMSTAT && "unexpected comparison operation");
-  SDValue Flags = Cmp.getOperand(0);
-  assert((Flags.getOpcode() == ARMISD::CMPFP ||
-          Flags.getOpcode() == ARMISD::CMPFPw0) &&
-         "unexpected operand of FMSTAT");
-  return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Flags);
+  return DAG.getNode(ARMISD::FMSTAT, dl, FlagsVT, Flags);
 }
 
 // This function returns three things: the arithmetic computation itself
@@ -5023,7 +5003,7 @@ ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
   case ISD::SADDO:
     ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
     Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, Value, LHS);
     break;
   case ISD::UADDO:
     ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);
@@ -5032,17 +5012,17 @@ ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
     Value = DAG.getNode(ARMISD::ADDC, dl,
                         DAG.getVTList(Op.getValueType(), MVT::i32), LHS, RHS)
                 .getValue(0);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, Value, LHS);
     break;
   case ISD::SSUBO:
     ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
     Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, LHS, RHS);
     break;
   case ISD::USUBO:
     ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);
     Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, LHS, RHS);
     break;
   case ISD::UMULO:
     // We generate a UMUL_LOHI and then check if the high word is 0.
@@ -5050,7 +5030,7 @@ ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
     Value = DAG.getNode(ISD::UMUL_LOHI, dl,
                         DAG.getVTList(Op.getValueType(), Op.getValueType()),
                         LHS, RHS);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value.getValue(1),
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, Value.getValue(1),
                               DAG.getConstant(0, dl, MVT::i32));
     Value = Value.getValue(0); // We only want the low 32 bits for the result.
     break;
@@ -5061,7 +5041,7 @@ ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
     Value = DAG.getNode(ISD::SMUL_LOHI, dl,
                         DAG.getVTList(Op.getValueType(), Op.getValueType()),
                         LHS, RHS);
-    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value.getValue(1),
+    OverflowCmp = DAG.getNode(ARMISD::CMP, dl, FlagsVT, Value.getValue(1),
                               DAG.getNode(ISD::SRA, dl, Op.getValueType(),
                                           Value.getValue(0),
                                           DAG.getConstant(31, dl, MVT::i32)));
@@ -5081,15 +5061,14 @@ ARMTargetLowering::LowerSignedALUO(SDValue Op, SelectionDAG &DAG) const {
   SDValue Value, OverflowCmp;
   SDValue ARMcc;
   std::tie(Value, OverflowCmp) = getARMXALUOOp(Op, DAG, ARMcc);
-  SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
   SDLoc dl(Op);
   // We use 0 and 1 as false and true values.
   SDValue TVal = DAG.getConstant(1, dl, MVT::i32);
   SDValue FVal = DAG.getConstant(0, dl, MVT::i32);
   EVT VT = Op.getValueType();
 
-  SDValue Overflow = DAG.getNode(ARMISD::CMOV, dl, VT, TVal, FVal,
-                                 ARMcc, CCR, OverflowCmp);
+  SDValue Overflow =
+      DAG.getNode(ARMISD::CMOV, dl, VT, TVal, FVal, ARMcc, OverflowCmp);
 
   SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::i32);
   return DAG.getNode(ISD::MERGE_VALUES, dl, VTs, Value, Overflow);
@@ -5226,11 +5205,9 @@ SDValue ARMTargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
     SDValue Value, OverflowCmp;
     SDValue ARMcc;
     std::tie(Value, OverflowCmp) = getARMXALUOOp(Cond, DAG, ARMcc);
-    SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
     EVT VT = Op.getValueType();
 
-    return getCMOV(dl, VT, SelectTrue, SelectFalse, ARMcc, CCR,
-                   OverflowCmp, DAG);
+    return getCMOV(dl, VT, SelectTrue, SelectFalse, ARMcc, OverflowCmp, DAG);
   }
 
   // Convert:
@@ -5258,14 +5235,9 @@ SDValue ARMTargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
         False = SelectTrue;
       }
 
-      if (True.getNode() && False.getNode()) {
-        EVT VT = Op.getValueType();
-        SDValue ARMcc = Cond.getOperand(2);
-        SDValue CCR = Cond.getOperand(3);
-        SDValue Cmp = duplicateCmp(Cond.getOperand(4), DAG);
-        assert(True.getValueType() == VT);
-        return getCMOV(dl, VT, True, False, ARMcc, CCR, Cmp, DAG);
-      }
+      if (True.getNode() && False.getNode())
+        return getCMOV(dl, Op.getValueType(), True, False, Cond.getOperand(2),
+                       Cond.getOperand(3), DAG);
     }
   }
 
@@ -5330,8 +5302,8 @@ static void checkVSELConstraints(ISD::CondCode CC, ARMCC::CondCodes &CondCode,
 }
 
 SDValue ARMTargetLowering::getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal,
-                                   SDValue TrueVal, SDValue ARMcc, SDValue CCR,
-                                   SDValue Cmp, SelectionDAG &DAG) const {
+                                   SDValue TrueVal, SDValue ARMcc, SDValue Cmp,
+                                   SelectionDAG &DAG) const {
   if (!Subtarget->hasFP64() && VT == MVT::f64) {
     FalseVal = DAG.getNode(ARMISD::VMOVRRD, dl,
                            DAG.getVTList(MVT::i32, MVT::i32), FalseVal);
@@ -5343,15 +5315,14 @@ SDValue ARMTargetLowering::getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal,
     SDValue FalseLow = FalseVal.getValue(0);
     SDValue FalseHigh = FalseVal.getValue(1);
 
-    SDValue Low = DAG.getNode(ARMISD::CMOV, dl, MVT::i32, FalseLow, TrueLow,
-                              ARMcc, CCR, Cmp);
+    SDValue Low =
+        DAG.getNode(ARMISD::CMOV, dl, MVT::i32, FalseLow, TrueLow, ARMcc, Cmp);
     SDValue High = DAG.getNode(ARMISD::CMOV, dl, MVT::i32, FalseHigh, TrueHigh,
-                               ARMcc, CCR, duplicateCmp(Cmp, DAG));
+                               ARMcc, Cmp);
 
     return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Low, High);
   } else {
-    return DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR,
-                       Cmp);
+    return DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, Cmp);
   }
 }
 
@@ -5625,12 +5596,11 @@ SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
     }
 
     SDValue ARMcc;
-    SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
     SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
     // Choose GE over PL, which vsel does now support
     if (ARMcc->getAsZExtVal() == ARMCC::PL)
       ARMcc = DAG.getConstant(ARMCC::GE, dl, MVT::i32);
-    return getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp, DAG);
+    return getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, Cmp, DAG);
   }
 
   ARMCC::CondCodes CondCode, CondCode2;
@@ -5660,13 +5630,10 @@ SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
 
   SDValue ARMcc = DAG.getConstant(CondCode, dl, MVT::i32);
   SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl);
-  SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
-  SDValue Result = getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp, DAG);
+  SDValue Result = getCMOV(dl, VT, FalseVal, TrueVal, ARMcc, Cmp, DAG);
   if (CondCode2 != ARMCC::AL) {
     SDValue ARMcc2 = DAG.getConstant(CondCode2, dl, MVT::i32);
-    // FIXME: Needs another CMP because flag can have but one use.
-    SDValue Cmp2 = getVFPCmp(LHS, RHS, DAG, dl);
-    Result = getCMOV(dl, VT, Result, TrueVal, ARMcc2, CCR, Cmp2, DAG);
+    Result = getCMOV(dl, VT, Result, TrueVal, ARMcc2, Cmp, DAG);
   }
   return Result;
 }
@@ -5767,9 +5734,8 @@ ARMTargetLowering::OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const {
       RHS = DAG.getNode(ISD::AND, dl, MVT::i32,
                         bitcastf32Toi32(RHS, DAG), Mask);
       SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
-      SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
-      return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,
-                         Chain, Dest, ARMcc, CCR, Cmp);
+      return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc,
+                         Cmp);
     }
 
     SDValue LHS1, LHS2;
@@ -5780,9 +5746,8 @@ ARMTargetLowering::OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const {
     RHS2 = DAG.getNode(ISD::AND, dl, MVT::i32, RHS2, Mask);
     ARMCC::CondCodes CondCode = IntCCToARMCC(CC);
     ARMcc = DAG.getConstant(CondCode, dl, MVT::i32);
-    SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);
     SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest };
-    return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops);
+    return DAG.getNode(ARMISD::BCC_i64, dl, MVT::Other, Ops);
   }
 
   return SDValue();
@@ -5816,9 +5781,8 @@ SDValue ARMTargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {
         (ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
     CondCode = ARMCC::getOppositeCondition(CondCode);
     ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
-    SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
 
-    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
+    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc,
                        OverflowCmp);
   }
 
@@ -5870,18 +5834,15 @@ SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
       CondCode = ARMCC::getOppositeCondition(CondCode);
       ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
     }
-    SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
 
-    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
+    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc,
                        OverflowCmp);
   }
 
   if (LHS.getValueType() == MVT::i32) {
     SDValue ARMcc;
     SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
-    SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
-    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,
-                       Chain, Dest, ARMcc, CCR, Cmp);
+    return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, Cmp);
   }
 
   if (getTargetMachine().Options.UnsafeFPMath &&
@@ -5896,14 +5857,12 @@ SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
 
   SDValue ARMcc = DAG.getConstant(CondCode, dl, MVT::i32);
   SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl);
-  SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
-  SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);
-  SDValue Ops[] = { Chain, Dest, ARMcc, CCR, Cmp };
-  SDValue Res = DAG.getNode(ARMISD::BRCOND, dl, VTList, Ops);
+  SDValue Ops[] = {Chain, Dest, ARMcc, Cmp};
+  SDValue Res = DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Ops);
   if (CondCode2 != ARMCC::AL) {
     ARMcc = DAG.getConstant(CondCode2, dl, MVT::i32);
-    SDValue Ops[] = { Res, Dest, ARMcc, CCR, Res.getValue(1) };
-    Res = DAG.getNode(ARMISD::BRCOND, dl, VTList, Ops);
+    SDValue Ops[] = {Res, Dest, ARMcc, Cmp};
+    Res = DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Ops);
   }
   return Res;
 }
@@ -6408,7 +6367,6 @@ SDValue ARMTargetLowering::LowerShiftRightParts(SDValue Op,
   SDValue ShOpHi = Op.getOperand(1);
   SDValue ShAmt  = Op.getOperand(2);
   SDValue ARMcc;
-  SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
   unsigned Opc = (Op.getOpcode() == ISD::SRA_PARTS) ? ISD::SRA : ISD::SRL;
 
   assert(Op.getOpcode() == ISD::SRA_PARTS || Op.getOpcode() == ISD::SRL_PARTS);
@@ -6423,8 +6381,8 @@ SDValue ARMTargetLowering::LowerShiftRightParts(SDValue Op,
   SDValue LoBigShift = DAG.getNode(Opc, dl, VT, ShOpHi, ExtraShAmt);
   SDValue CmpLo = getARMCmp(ExtraShAmt, DAG.getConstant(0, dl, MVT::i32),
                             ISD::SETGE, ARMcc, DAG, dl);
-  SDValue Lo = DAG.getNode(ARMISD::CMOV, dl, VT, LoSmallShift, LoBigShift,
-                           ARMcc, CCR, CmpLo);
+  SDValue Lo =
+      DAG.getNode(ARMISD::CMOV, dl, VT, LoSmallShift, LoBigShift, ARMcc, CmpLo);
 
   SDValue HiSmallShift = DAG.getNode(Opc, dl, VT, ShOpHi, ShAmt);
   SDValue HiBigShift = Opc == ISD::SRA
@@ -6433,8 +6391,8 @@ SDValue ARMTargetLowering::LowerShiftRightParts(SDValue Op,
                            : DAG.getConstant(0, dl, VT);
   SDValue CmpHi = getARMCmp(ExtraShAmt, DAG.getConstant(0, dl, MVT::i32),
                             ISD::SETGE, ARMcc, DAG, dl);
-  SDValue Hi = DAG.getNode(ARMISD::CMOV, dl, VT, HiSmallShift, HiBigShift,
-                           ARMcc, CCR, CmpHi);
+  SDValue Hi =
+      DAG.getNode(ARMISD::CMOV, dl, VT, HiSmallShift, HiBigShift, ARMcc, CmpHi);
 
   SDValue Ops[2] = { Lo, Hi };
   return DAG.getMergeValues(Ops, dl);
@@ -6452,7 +6410,6 @@ SDValue ARMTargetLowering::LowerShiftLeftParts(SDValue Op,
   SDValue ShOpHi = Op.getOperand(1);
   SDValue ShAmt  = Op.getOperand(2);
   SDValue ARMcc;
-  SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
 
   assert(Op.getOpcode() == ISD::SHL_PARTS);
   SDValue RevShAmt = DAG.getNode(ISD::SUB, dl, MVT::i32,
@@ -6466,14 +6423,14 @@ SDValue ARMTargetLowering::LowerShiftLeftPart...
[truncated]

Following llvm#116547 and llvm#116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.
RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;

} // hasSideEffects

def : ARMPat<(ARMcmov i32:$false, i32:$Rm, imm:$cc, CPSR),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some patterns have to be defined out-of-line because the number of instruction operands no longer matches the number of SDNode operands. On an instruction, pred is considered single operand, while there are two operands on SDNode that should match it (cc and flags).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to me like the kind of thing it's worth mentioning in the commit message, so that it'll be available to anyone in the future reading this patch.

(LLVM has moved its git hosting once already, so I think it's better not to assume the Github PR comments will always be available.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to three places. Please let me know if you think it wasn't necessary and I should've just updated the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me – as long as it's in the git history somewhere, people will be able to work it out. And you may be right that the comment has ongoing value outside the context of the change.

Copy link

github-actions bot commented Nov 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines -649 to +651
; THUMB2-NEXT: movne.w lr, #65536
; THUMB2-NEXT: it ne
; THUMB2-NEXT: movne.w r12, #1
; THUMB2-NEXT: it ne
; THUMB2-NEXT: movne.w lr, #65536
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not optimized to a single IT block because the optimization pass does not hoist mov.w lr, #1 appearing between conditional moves. It only hoists register copies.

@@ -105,77 +105,177 @@ entry:
@bb = hidden local_unnamed_addr global i64 0, align 8

define dso_local i64 @cc() local_unnamed_addr #1 {
; CHECK-LABEL: cc:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, check lines were only present for V6M.

; CHECK-MVE-NEXT: lsls r0, r0, #31
; CHECK-MVE-NEXT: lsls r1, r1, #31
; CHECK-MVE-NEXT: vseleq.f16 s4, s8, s6
; CHECK-MVE-NEXT: cmp r0, #0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should've been optimized further, let me investigate this small regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that the nodes are scheduled a little differently, which ARMBaseInstrInfo::optimizeCompareInstr is unable to handle. I think this can be fixed, but should be done separately. Given the overall improvement this PR provides, I suppose it's okay to do this later?

@@ -1965,32 +1965,34 @@ define float @debug_info(float %gamma, float %slopeLimit, i1 %or.cond, double %t
; ARM-ENABLE-NEXT: @ %bb.1: @ %bb3
; ARM-ENABLE-NEXT: push {r4, r7, lr}
; ARM-ENABLE-NEXT: add r7, sp, #4
; ARM-ENABLE-NEXT: sub r4, sp, #16
; ARM-ENABLE-NEXT: sub r4, sp, #24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what's up with this increased size of stack frame? I can see that it goes with a newly introduced spill and reload of d10 to the [r4, #16] slot, but why has this change created the need for an extra spill at all? Has pre-RA scheduling somehow increased pressure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't investigate this regrssion as the test appears to be testing something different. I'll take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the reason is that DAG scheduler placed VADD before the call to pow and this increased register pressure over the limit. (Unfortunately, machine scheduler can't reorder instructions around calls.)

; CHECK-NEXT: cmp r2, #42
; CHECK-NEXT: orrne r0, r0, #16
; CHECK-NEXT: and r0, r0, #4
; CHECK-NEXT: bic r1, r1, #255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit odd too – it seemed obviously better the previous way, and there's no obvious reason why your change should have caused it.

Copy link
Contributor Author

@s-barannikov s-barannikov Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out, too. Thanks for noticing, I've missed this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I actually did look into this.
Here is the diff after finalize-isel:
image
As can be seen, the new code is identical to the old one, modulo instruction order.
I think the issue is with register allocation and subsequent copy elimination as described in #98087 (see also follow-up #105562).

RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;

} // hasSideEffects

def : ARMPat<(ARMcmov i32:$false, i32:$Rm, imm:$cc, CPSR),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to me like the kind of thing it's worth mentioning in the commit message, so that it'll be available to anyone in the future reading this patch.

(LLVM has moved its git hosting once already, so I think it's better not to assume the Github PR comments will always be available.)

Copy link
Collaborator

@statham-arm statham-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big patch, but it's my absolute favourite kind: reduces the total amount of code and makes things better! Overall the diff stats are negative in llvm/lib and in llvm/test.

I didn't read absolutely all of the code diffs through, but I did spot-check a few things and came across a couple more tiny regressions, which I've highlighted just in case you happen to know what caused them. But I agree with you that this seems overall a good thing, and knock-on effects on scheduling can be dealt with later. (Also, this patch is big enough by itself and doesn't need extra things folded into it!)

@s-barannikov
Copy link
Contributor Author

That seems to me like the kind of thing it's worth mentioning in the commit message, so that it'll be available to anyone in the future reading this patch.

(LLVM has moved its git hosting once already, so I think it's better not to assume the Github PR comments will always be available.)

Good point, I'll put the comment in code.

@s-barannikov s-barannikov merged commit a348f22 into llvm:main Nov 30, 2024
8 checks passed
@s-barannikov s-barannikov deleted the sdag/arm-glue branch November 30, 2024 05:14
@mstorsjo
Copy link
Member

mstorsjo commented Dec 1, 2024

I've run into another crash in Wine compiled for armv7 with clang after this has landed; I've finished bisecting the code now, will look into pinpointing which individual object file the faulty generated code is within.

@s-barannikov
Copy link
Contributor Author

s-barannikov commented Dec 1, 2024

I kind of expected that.
It is likely there are multiple object files condaiting faulty generated code. It might help compiling with -mllvm -verify-machineinstrs.
If the issue is pressing, don't hesitate to push the revert button. Otherwise I'll look into it tomorrow (it is past midnight here).

@mstorsjo
Copy link
Member

mstorsjo commented Dec 1, 2024

I've pinpointed it to one single object file (all the others can be compiled with this change without breakage), but I'll try to pinpoint the individual function(s) that contain the issue as well. Compiling with -mllvm -verify-machineinstrs doesn't point anything out, and neither does compiling with -S like it did last time around.

The issue does have some knock-on effects on other projects in my testing setup, so I'll revert it now and continue pinpointing it tomorrow, and I'll hand over the narrowed down case to you then.

mstorsjo added a commit that referenced this pull request Dec 1, 2024
Reverts #116970.

This change broke Wine compiled for armv7, causing segfaults when
starting Wine. See #116970 for more detailed discussion
about the issue.
@mstorsjo
Copy link
Member

mstorsjo commented Dec 2, 2024

I haven't yet managed to pinpoint exactly which function the breakage happens in (I didn't have quite as much time to spend on it today as I had hoped), but the faulty translation unit can be reproduced with https://martin.st/temp/virtual-preproc.c:

clang-good -target armv7-linux-gnueabihf -fcf-protection=none -fvisibility=hidden -fno-stack-protector -fno-strict-aliasing -fPIC -fasynchronous-unwind-tables -O2 -mthumb virtual-preproc.c -c -o virtual.o

(Not sure if all those -f options are needed for triggering the issue, this is the command currently in use at least.)

@s-barannikov
Copy link
Contributor Author

s-barannikov commented Dec 3, 2024

Thank you,
The diff between the generated assembly files is not very big (~76 differences), so I might be able to figure out what's wrong without reducing the test further.

@s-barannikov
Copy link
Contributor Author

There are three functions under suspicion: get_zero_bits_limit, set_protection, and fill_basic_memory_info.
The first one changed the most, but if I manually reduced it correctly, it is not the one containing the faulty code.
The two others have similar changes like this:
image
This looks suspicious, but needs further analysis.

@mstorsjo
Copy link
Member

mstorsjo commented Dec 3, 2024

I managed to pinpoint the faulting change; instead of trying to split the source C file (which turned out to be quite annoying here), I diffed the output assembly and selectively applied parts of the differences to the output assembly; the faulty change is the one in set_protection, with the following output diff:

@@ -11514,18 +11515,18 @@
 .LBB59_6:                               @ %sw.bb5.i
        movs    r6, #29
        movs    r5, #13
-       ands    r4, lr, #16777216
-       it      eq                       
-       moveq   r6, #23                  
+       lsls.w  r4, lr, #7
+       it      pl
+       movpl   r6, #23
        it      eq
        moveq   r5, #7
        b       .LBB59_11
 .LBB59_7:                               @ %sw.bb1.i
        movs    r6, #25
        movs    r5, #9
-       ands    r4, lr, #16777216
-       it      eq
-       moveq   r6, #19
+       lsls.w  r4, lr, #7
+       it      pl                       
+       movpl   r6, #19 
        it      eq
        moveq   r5, #3
        b       .LBB59_11

@mstorsjo
Copy link
Member

mstorsjo commented Dec 3, 2024

Now I see the fault with the code change.

@@ -11514,18 +11515,18 @@
 .LBB59_6:                               @ %sw.bb5.i
        movs    r6, #29
        movs    r5, #13
-       ands    r4, lr, #16777216
-       it      eq                       
-       moveq   r6, #23                  
+       lsls.w  r4, lr, #7
+       it      pl
+       movpl   r6, #23
        it      eq
        moveq   r5, #7
        b       .LBB59_11

The updated lsls.w followed by movpl should be equivalent to the previous ands and moveq, as far as I can see.

But after the flag setting instruction and conditional mov, we have another moveq r5, which no longer gets the specific condition set that it is expecting. I would guess that this is the bug.

Edit: Also, if I edit the following it eq and moveq into it pl and movpl (and same for the other similar diff in the same function), the code no longer crashes, so that seems to be exactly it.

@s-barannikov
Copy link
Contributor Author

The guilty function is ARMDAGToDAGISel::SelectCMPZ. Before the patch, CMPZ could only have one use. This function replaces AND feeding CMPZ with a shift and notifies the caller that it should change the condition code in the using node (eq -> pl).

After this patch, CMPZ can have multiple uses, but only one of them is updated.

This can be worked around by checking that CMPZ has exactly one use, but ideally this optimization should happen earlier, in PerformTargetDAGCombine. I'll add a workaround now and will try to move this optimization to ARMTargetLowering in a later patch.

s-barannikov added a commit that referenced this pull request Dec 7, 2024
Re-landing #116970 after fixing miscompilation error.

The original change made it possible for CMPZ to have multiple uses;
`ARMDAGToDAGISel::SelectCMPZ` was not prepared for this.

Pull Request: #118887


Original commit message:

Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants