Add cond_sub and sub_clamp operations to atomicrmw #96661

anjenner · 2024-06-25T16:01:45Z

These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.

AMDGPU has instructions for these. Currently we use target intrinsics for these, but those do not carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes. Drop and upgrade llvm.amdgcn.atomic.cond.sub/csub to atomicrmw.

Note that in GFX12 onwards "csub" is renamed to "sub_clamp" - the atomicrmw operation uses the newer name.

These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative. AMDGPU has instructions for these. Currently we use target intrinsics for these, but those do not carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes. Drop and upgrade llvm.amdgcn.atomic.cond.sub/csub to atomicrmw. Note that in GFX12 onwards "csub" is renamed to "sub_clamp" - the atomicrmw operation uses the newer name.

github-actions · 2024-06-25T16:02:04Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-06-25T16:02:35Z

@llvm/pr-subscribers-backend-hexagon
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-backend-arm

Author: None (anjenner)

Changes

These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.

AMDGPU has instructions for these. Currently we use target intrinsics for these, but those do not carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes. Drop and upgrade llvm.amdgcn.atomic.cond.sub/csub to atomicrmw.

Note that in GFX12 onwards "csub" is renamed to "sub_clamp" - the atomicrmw operation uses the newer name.

Patch is 353.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96661.diff

81 Files Affected:

(modified) llvm/bindings/ocaml/llvm/llvm.ml (+6)
(modified) llvm/bindings/ocaml/llvm/llvm.mli (+6)
(modified) llvm/docs/AMDGPUUsage.rst (-5)
(modified) llvm/docs/GlobalISel/GenericOpcode.rst (+3-1)
(modified) llvm/docs/LangRef.rst (+4)
(modified) llvm/docs/ReleaseNotes.rst (+6)
(modified) llvm/include/llvm/AsmParser/LLToken.h (+2)
(modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+3-1)
(modified) llvm/include/llvm/CodeGen/ISDOpcodes.h (+2)
(modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+22-18)
(modified) llvm/include/llvm/IR/Instructions.h (+9-1)
(modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (-8)
(modified) llvm/include/llvm/Support/TargetOpcodes.def (+3-1)
(modified) llvm/include/llvm/Target/GenericOpcodes.td (+2)
(modified) llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td (+2)
(modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+4)
(modified) llvm/lib/AsmParser/LLLexer.cpp (+2)
(modified) llvm/lib/AsmParser/LLParser.cpp (+6)
(modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+4)
(modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+4)
(modified) llvm/lib/CodeGen/AtomicExpandPass.cpp (+7-1)
(modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+6)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+9-15)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+6)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (+4)
(modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+2)
(modified) llvm/lib/IR/AutoUpgrade.cpp (+7-4)
(modified) llvm/lib/IR/Instructions.cpp (+4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUGISel.td (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+9)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructions.td (+6-4)
(modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+7-3)
(modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+9-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td (-6)
(modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+1-1)
(modified) llvm/lib/Target/AMDGPU/DSInstructions.td (+39-11)
(modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+7-16)
(modified) llvm/lib/Target/AMDGPU/R600ISelLowering.cpp (+2)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-20)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+2)
(modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+3-1)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+2)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+3-1)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2)
(modified) llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp (+2)
(modified) llvm/lib/Transforms/Utils/LowerAtomic.cpp (+11)
(modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/atomics-gmir.mir (+6)
(modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/atomics.ll (-57)
(modified) llvm/test/Assembler/atomic.ll (+10)
(modified) llvm/test/Bitcode/amdgcn-atomic.ll (+81)
(modified) llvm/test/Bitcode/compatibility.ll (+28)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+6)
(added) llvm/test/CodeGen/AArch64/atomicrmw-cond-sub-clamp.ll (+142)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_cond_sub.ll (+109)
(renamed) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_sub_clamp.ll (+75-32)
(modified) llvm/test/CodeGen/AMDGPU/atomics_cond_sub.ll (+43-21)
(modified) llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll (+6-4)
(modified) llvm/test/CodeGen/AMDGPU/global-saddr-atomics.gfx1030.ll (+14-4)
(removed) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.cond.sub.ll (-219)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.atomic.csub.ll (+4-6)
(modified) llvm/test/CodeGen/AMDGPU/private-memory-atomics.ll (+52)
(modified) llvm/test/CodeGen/AMDGPU/shl_add_ptr_csub.ll (+1-1)
(added) llvm/test/CodeGen/ARM/atomicrmw-cond-sub-clamp.ll (+186)
(added) llvm/test/CodeGen/Hexagon/atomicrmw-cond-sub-clamp.ll (+355)
(added) llvm/test/CodeGen/LoongArch/atomicrmw-cond-sub-clamp.ll (+362)
(added) llvm/test/CodeGen/PowerPC/atomicrmw-cond-sub-clamp.ll (+396)
(added) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1412)
(added) llvm/test/CodeGen/SPARC/atomicrmw-cond-sub-clamp.ll (+326)
(added) llvm/test/CodeGen/VE/Scalar/atomicrmw-cond-sub-clamp.ll (+240)
(added) llvm/test/CodeGen/WebAssembly/atomicrmw-cond-sub-clamp.ll (+355)
(added) llvm/test/CodeGen/X86/atomicrmw-cond-sub-clamp.ll (+153)
(modified) llvm/test/TableGen/GlobalISelCombinerEmitter/match-table-cxx.td (+19-19)
(modified) llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td (+31-31)
(modified) llvm/test/TableGen/GlobalISelEmitter.td (+1-1)
(modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-i16.ll (+358)
(modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-i8.ll (+798)
(modified) mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td (+5-1)
(modified) mlir/test/Target/LLVMIR/Import/instructions.ll (+5-1)
(modified) mlir/test/Target/LLVMIR/llvmir.mlir (+5-1)

diff --git a/llvm/bindings/ocaml/llvm/llvm.ml b/llvm/bindings/ocaml/llvm/llvm.ml
index 86b010e0ac22d..ae42b1eea93d6 100644
--- a/llvm/bindings/ocaml/llvm/llvm.ml
+++ b/llvm/bindings/ocaml/llvm/llvm.ml
@@ -296,6 +296,12 @@ module AtomicRMWBinOp = struct
   | UMin
   | FAdd
   | FSub
+  | FMax
+  | FMin
+  | UInc_Wrap
+  | UDec_Wrap
+  | Cond_Sub
+  | Sub_Clamp
 end
 
 module ValueKind = struct
diff --git a/llvm/bindings/ocaml/llvm/llvm.mli b/llvm/bindings/ocaml/llvm/llvm.mli
index c16530d3a70cb..9a6ed2ae80043 100644
--- a/llvm/bindings/ocaml/llvm/llvm.mli
+++ b/llvm/bindings/ocaml/llvm/llvm.mli
@@ -331,6 +331,12 @@ module AtomicRMWBinOp : sig
   | UMin
   | FAdd
   | FSub
+  | FMax
+  | FMin
+  | UInc_Wrap
+  | UDec_Wrap
+  | Cond_Sub
+  | Sub_Clamp
 end
 
 (** The kind of an [llvalue], the result of [classify_value v].
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5a16457412d24..cf0ad1b578759 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1321,11 +1321,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
 
                                                    The iglp_opt strategy implementations are subject to change.
 
-  llvm.amdgcn.atomic.cond.sub.u32                  Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
-                                                   and ds_cond_sub_u32 based on address space on gfx12 targets. This
-                                                   performs subtraction only if the memory value is greater than or
-                                                   equal to the data value.
-
   llvm.amdgcn.s.getpc                              Provides access to the s_getpc_b64 instruction, but with the return value
                                                    sign-extended from the width of the underlying PC hardware register even on
                                                    processors where the s_getpc_b64 instruction returns a zero-extended value.
diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst
index 42f56348885b4..67bd134174644 100644
--- a/llvm/docs/GlobalISel/GenericOpcode.rst
+++ b/llvm/docs/GlobalISel/GenericOpcode.rst
@@ -825,7 +825,9 @@ operands.
                                G_ATOMICRMW_MIN, G_ATOMICRMW_UMAX,
                                G_ATOMICRMW_UMIN, G_ATOMICRMW_FADD,
                                G_ATOMICRMW_FSUB, G_ATOMICRMW_FMAX,
-                               G_ATOMICRMW_FMIN
+                               G_ATOMICRMW_FMIN, G_ATOMICRMW_UINC_WRAP,
+			       G_ATOMICRMW_UDEC_WRAP, G_ATOMICRMW_COND_SUB,
+			       G_ATOMICRMW_SUB_CLAMP
 
 Generic atomicrmw. Expects a MachineMemOperand in addition to explicit
 operands.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index edb362c617565..ed76bd454002a 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11209,6 +11209,8 @@ operation. The operation must be one of the following keywords:
 -  fmin
 -  uinc_wrap
 -  udec_wrap
+-  cond_sub
+-  sub_clamp
 
 For most of these operations, the type of '<value>' must be an integer
 type whose bit width is a power of two greater than or equal to eight
@@ -11259,6 +11261,8 @@ operation argument:
 -  fmin: ``*ptr = minnum(*ptr, val)`` (match the `llvm.minnum.*`` intrinsic)
 -  uinc_wrap: ``*ptr = (*ptr u>= val) ? 0 : (*ptr + 1)`` (increment value with wraparound to zero when incremented above input value)
 -  udec_wrap: ``*ptr = ((*ptr == 0) || (*ptr u> val)) ? val : (*ptr - 1)`` (decrement with wraparound to input value when decremented below zero).
+-  cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).
+-  sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).
 
 
 Example:
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 76356dd76f1d2..a6162127b3b74 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -80,6 +80,8 @@ Changes to the LLVM IR
     removed. The next argument has been changed from byte index to bit
     index.
 
+* Added ``cond_sub`` and ``sub_clamp`` operations to ``atomicrmw``.
+
 Changes to LLVM infrastructure
 ------------------------------
 
@@ -132,6 +134,10 @@ Changes to the AMDGPU Backend
 
 * Implemented :ref:`llvm.get.rounding <int_get_rounding>` and :ref:`llvm.set.rounding <int_set_rounding>`
 
+* Removed ``llvm.amdgcn.atomic.cond.sub.u32`` and
+  ``llvm.amdgcn.atomic.csub.u32`` intrinsics. :ref:`atomicrmw <i_atomicrmw>`
+  should be used instead with ``cond_sub`` and ``sub_clamp``.
+
 Changes to the ARM Backend
 --------------------------
 
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index db6780b70ca5a..8ee04f25095f2 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -268,6 +268,8 @@ enum Kind {
   kw_fmin,
   kw_uinc_wrap,
   kw_udec_wrap,
+  kw_cond_sub,
+  kw_sub_clamp,
 
   // Instruction Opcodes (Opcode in UIntVal).
   kw_fneg,
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 5b5e08b5cbc3f..20980695499e6 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -484,7 +484,9 @@ enum RMWOperations {
   RMW_FMAX = 13,
   RMW_FMIN = 14,
   RMW_UINC_WRAP = 15,
-  RMW_UDEC_WRAP = 16
+  RMW_UDEC_WRAP = 16,
+  RMW_COND_SUB = 17,
+  RMW_SUB_CLAMP = 18
 };
 
 /// OverflowingBinaryOperatorOptionalFlags - Flags for serializing
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 6bb89fb58a296..21ac93a3b4b9b 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1292,6 +1292,8 @@ enum NodeType {
   ATOMIC_LOAD_FMIN,
   ATOMIC_LOAD_UINC_WRAP,
   ATOMIC_LOAD_UDEC_WRAP,
+  ATOMIC_LOAD_COND_SUB,
+  ATOMIC_LOAD_SUB_CLAMP,
 
   // Masked load and store - consecutive vector load and store operations
   // with additional mask operand that prevents memory accesses to the
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 2f36c2e86b1c3..1f3dd4ac1eda6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1470,6 +1470,8 @@ class MemSDNode : public SDNode {
     case ISD::ATOMIC_LOAD_FMIN:
     case ISD::ATOMIC_LOAD_UINC_WRAP:
     case ISD::ATOMIC_LOAD_UDEC_WRAP:
+    case ISD::ATOMIC_LOAD_COND_SUB:
+    case ISD::ATOMIC_LOAD_SUB_CLAMP:
     case ISD::ATOMIC_LOAD:
     case ISD::ATOMIC_STORE:
     case ISD::MLOAD:
@@ -1536,27 +1538,29 @@ class AtomicSDNode : public MemSDNode {
 
   // Methods to support isa and dyn_cast
   static bool classof(const SDNode *N) {
-    return N->getOpcode() == ISD::ATOMIC_CMP_SWAP     ||
+    return N->getOpcode() == ISD::ATOMIC_CMP_SWAP ||
            N->getOpcode() == ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS ||
-           N->getOpcode() == ISD::ATOMIC_SWAP         ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_ADD     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_SUB     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_AND     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_CLR     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_OR      ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_XOR     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_NAND    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_MIN     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_MAX     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_UMIN    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_UMAX    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FADD    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FSUB    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FMAX    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FMIN    ||
+           N->getOpcode() == ISD::ATOMIC_SWAP ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_ADD ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_SUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_AND ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_CLR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_OR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_XOR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_NAND ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_MIN ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_MAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_UMIN ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_UMAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FADD ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FSUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FMAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FMIN ||
            N->getOpcode() == ISD::ATOMIC_LOAD_UINC_WRAP ||
            N->getOpcode() == ISD::ATOMIC_LOAD_UDEC_WRAP ||
-           N->getOpcode() == ISD::ATOMIC_LOAD         ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_COND_SUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_SUB_CLAMP ||
+           N->getOpcode() == ISD::ATOMIC_LOAD ||
            N->getOpcode() == ISD::ATOMIC_STORE;
   }
 };
diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h
index ab58edd1bf78c..ab5c20758abde 100644
--- a/llvm/include/llvm/IR/Instructions.h
+++ b/llvm/include/llvm/IR/Instructions.h
@@ -750,8 +750,16 @@ class AtomicRMWInst : public Instruction {
     /// *p = ((old == 0) || (old u> v)) ? v : (old - 1)
     UDecWrap,
 
+    /// Subtract only if result would be positive.
+    /// *p = (old u>= v) ? old - v : old
+    CondSub,
+
+    /// Subtract with clamping of negative results to zero.
+    /// *p = (old u>= v) ? old - v : 0
+    SubClamp,
+
     FIRST_BINOP = Xchg,
-    LAST_BINOP = UDecWrap,
+    LAST_BINOP = SubClamp,
     BAD_BINOP
   };
 
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 7a5e919fe26e3..5a3c80ebaa6e2 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1284,7 +1284,6 @@ def int_amdgcn_raw_buffer_atomic_or : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_xor : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_inc : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_dec : AMDGPURawBufferAtomic;
-def int_amdgcn_raw_buffer_atomic_cond_sub_u32 : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1321,7 +1320,6 @@ def int_amdgcn_raw_ptr_buffer_atomic_or : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_xor : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_inc : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_dec : AMDGPURawPtrBufferAtomic;
-def int_amdgcn_raw_ptr_buffer_atomic_cond_sub_u32 : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1362,7 +1360,6 @@ def int_amdgcn_struct_buffer_atomic_or : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_xor : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_inc : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_dec : AMDGPUStructBufferAtomic;
-def int_amdgcn_struct_buffer_atomic_cond_sub_u32 : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1398,7 +1395,6 @@ def int_amdgcn_struct_ptr_buffer_atomic_or : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_xor : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_inc : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_dec : AMDGPUStructPtrBufferAtomic;
-def int_amdgcn_struct_ptr_buffer_atomic_cond_sub_u32 : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -2392,8 +2388,6 @@ class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
   [IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree], "",
   [SDNPMemOperand]>;
 
-def int_amdgcn_global_atomic_csub : AMDGPUAtomicRtn<llvm_i32_ty>;
-
 // uint4 llvm.amdgcn.image.bvh.intersect.ray <node_ptr>, <ray_extent>, <ray_origin>,
 //                                           <ray_dir>, <ray_inv_dir>, <texture_descr>
 // <node_ptr> is i32 or i64.
@@ -2594,8 +2588,6 @@ def int_amdgcn_flat_atomic_fmax_num   : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 
-def int_amdgcn_atomic_cond_sub_u32 : AMDGPUAtomicRtn<llvm_i32_ty>;
-
 class AMDGPULoadIntrinsic<LLVMType ptr_ty>:
   Intrinsic<
     [llvm_any_ty],
diff --git a/llvm/include/llvm/Support/TargetOpcodes.def b/llvm/include/llvm/Support/TargetOpcodes.def
index df4b264af72a8..75bc350fa8d6b 100644
--- a/llvm/include/llvm/Support/TargetOpcodes.def
+++ b/llvm/include/llvm/Support/TargetOpcodes.def
@@ -411,12 +411,14 @@ HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMAX)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMIN)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_UINC_WRAP)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_UDEC_WRAP)
+HANDLE_TARGET_OPCODE(G_ATOMICRMW_COND_SUB)
+HANDLE_TARGET_OPCODE(G_ATOMICRMW_SUB_CLAMP)
 
 // Marker for start of Generic AtomicRMW opcodes
 HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_START, G_ATOMICRMW_XCHG)
 
 // Marker for end of Generic AtomicRMW opcodes
-HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_UDEC_WRAP)
+HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_SUB_CLAMP)
 
 // Generic atomic fence
 HANDLE_TARGET_OPCODE(G_FENCE)
diff --git a/llvm/include/llvm/Target/GenericOpcodes.td b/llvm/include/llvm/Target/GenericOpcodes.td
index 4abffe6476c85..1691e83eae377 100644
--- a/llvm/include/llvm/Target/GenericOpcodes.td
+++ b/llvm/include/llvm/Target/GenericOpcodes.td
@@ -1291,6 +1291,8 @@ def G_ATOMICRMW_FMAX : G_ATOMICRMW_OP;
 def G_ATOMICRMW_FMIN : G_ATOMICRMW_OP;
 def G_ATOMICRMW_UINC_WRAP : G_ATOMICRMW_OP;
 def G_ATOMICRMW_UDEC_WRAP : G_ATOMICRMW_OP;
+def G_ATOMICRMW_COND_SUB : G_ATOMICRMW_OP;
+def G_ATOMICRMW_SUB_CLAMP : G_ATOMICRMW_OP;
 
 def G_FENCE : GenericInstruction {
   let OutOperandList = (outs);
diff --git a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
index 560d3b434d07d..43d4e8d37e9b0 100644
--- a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
+++ b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
@@ -252,6 +252,8 @@ def : GINodeEquiv<G_ATOMICRMW_FMAX, atomic_load_fmax>;
 def : GINodeEquiv<G_ATOMICRMW_FMIN, atomic_load_fmin>;
 def : GINodeEquiv<G_ATOMICRMW_UINC_WRAP, atomic_load_uinc_wrap>;
 def : GINodeEquiv<G_ATOMICRMW_UDEC_WRAP, atomic_load_udec_wrap>;
+def : GINodeEquiv<G_ATOMICRMW_COND_SUB, atomic_load_cond_sub>;
+def : GINodeEquiv<G_ATOMICRMW_SUB_CLAMP, atomic_load_sub_clamp>;
 def : GINodeEquiv<G_FENCE, atomic_fence>;
 def : GINodeEquiv<G_PREFETCH, prefetch>;
 def : GINodeEquiv<G_TRAP, trap>;
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 8cbf98cd58ca9..ac6cfd823eb44 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -722,6 +722,10 @@ def atomic_load_uinc_wrap : SDNode<"ISD::ATOMIC_LOAD_UINC_WRAP", SDTAtomic2,
                     [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
 def atomic_load_udec_wrap : SDNode<"ISD::ATOMIC_LOAD_UDEC_WRAP", SDTAtomic2,
                     [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
+def atomic_load_cond_sub : SDNode<"ISD::ATOMIC_LOAD_COND_SUB", SDTAtomic2,
+                    [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
+def atomic_load_sub_clamp : SDNode<"ISD::ATOMIC_LOAD_SUB_CLAMP", SDTAtomic2,
+                    [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
 
 def atomic_load      : SDNode<"ISD::ATOMIC_LOAD", SDTAtomicLoad,
                     [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 7d7fe19568e8a..cbd039bf98c44 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -704,6 +704,8 @@ lltok::Kind LLLexer::LexIdentifier() {
   KEYWORD(umin); KEYWORD(fmax); KEYWORD(fmin);
   KEYWORD(uinc_wrap);
   KEYWORD(udec_wrap);
+  KEYWORD(cond_sub);
+  KEYWORD(sub_clamp);
 
   KEYWORD(splat);
   KEYWORD(vscale);
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 21d386097fc63..2d086c859c14c 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -8331,6 +8331,12 @@ int LLParser::parseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) {
   case lltok::kw_udec_wrap:
     Operation = AtomicRMWInst::UDecWrap;
     break;
+  case lltok::kw_cond_sub:
+    Operation = AtomicRMWInst::CondSub;
+    break;
+  case lltok::kw_sub_clamp:
+    Operation = AtomicRMWInst::SubClamp;
+    break;
   case lltok::kw_fadd:
     Operation = AtomicRMWInst::FAdd;
     IsFP = true;
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 05c9697123371..eea10e9221b53 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1349,6 +1349,10 @@ static AtomicRMWInst::BinOp getDecodedRMWOperation(unsigned Val) {
     return AtomicRMWInst::UIncWrap;
   case bitc::RMW_UDEC_WRAP:
     return AtomicRMWInst::UDecWrap;
+  case bitc::RMW_COND_SUB:
+    return AtomicRMWInst::CondSub;
+  case bitc::RMW_SUB_CLAMP:
+    return AtomicRMWInst::SubClamp;
   }
 }
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index ba16c0851e1fd..12002803ca54e 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -658,6 +658,10 @@ static unsigned getEncodedRMWOperation(AtomicRMWInst::BinOp Op) {
     return bitc::RMW_UINC_WRAP;
   case AtomicRMWInst::UDecWrap:
     return bitc::RMW_UDEC_WRAP;
+  case AtomicRMWInst::CondSub:
+    return bitc::RMW_COND_SUB;
+  case AtomicRMWInst::SubClamp:
+    return bitc::RMW_SUB_CLAMP;
   }
 }
 
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index 7728cc50fc9f9..c54875e4bc0a2 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -868,7 +868,9 @@ static Value *performMaskedAtomicOp(AtomicRMWInst::BinOp Op,
   case AtomicRMWInst::FMin:
   case AtomicRMWInst::FMax:
   case AtomicRMWInst::UIncWrap:
-  case AtomicRMWInst::UDecWrap: {
+  case AtomicRMWInst::UDecWrap:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp: {
     // Finally, other ops will operate on the full value, so truncate down to
     // the original size, and expand out again after doing the
     // operation. Bitcasts will be inserted for FP values.
@@ -1542,6 +1544,8 @@ bool AtomicExpandImpl::isIdempotentRMW(AtomicRMWInst *RMWI) {
   case AtomicRMWInst::Sub:
   case AtomicRMWInst::Or:
   case AtomicRMWInst::Xor:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp:
     return C->isZero();
   case AtomicRMWInst::And:
     return C->isMinusOne();
@@ -1783,6 +1787,8 @@ static ArrayRef<RTLIB::Libcall> GetRMWLibcall(AtomicRMWInst::BinOp Op) {
   case AtomicRMWInst::FSub:
   case AtomicRMWInst::UIncWrap:
   case AtomicRMWInst::UDecWrap:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp:
     // No atomic libcalls are available for max/min/umax/umin.
     return {};
   }
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index c06b35a98e434..452d8a1599636 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -3289,6 +3289,12 @@ bool IRTranslator::translateAtomicRMW(const User &U,
   case AtomicRMWInst::UDecWrap:
     Opcode = TargetOpcode::G_ATOMICRMW_UDEC_WRAP;
     break;
+  case AtomicRMWInst::CondSub:
+    Opcode = TargetOpcode::G_ATOMICRMW_COND_SUB;
+    b...
[truncated]

llvmbot · 2024-06-25T16:02:36Z

@llvm/pr-subscribers-backend-powerpc

Author: None (anjenner)

Changes

These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.

AMDGPU has instructions for these. Currently we use target intrinsics for these, but those do not carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes. Drop and upgrade llvm.amdgcn.atomic.cond.sub/csub to atomicrmw.

Note that in GFX12 onwards "csub" is renamed to "sub_clamp" - the atomicrmw operation uses the newer name.

Patch is 353.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96661.diff

81 Files Affected:

(modified) llvm/bindings/ocaml/llvm/llvm.ml (+6)
(modified) llvm/bindings/ocaml/llvm/llvm.mli (+6)
(modified) llvm/docs/AMDGPUUsage.rst (-5)
(modified) llvm/docs/GlobalISel/GenericOpcode.rst (+3-1)
(modified) llvm/docs/LangRef.rst (+4)
(modified) llvm/docs/ReleaseNotes.rst (+6)
(modified) llvm/include/llvm/AsmParser/LLToken.h (+2)
(modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+3-1)
(modified) llvm/include/llvm/CodeGen/ISDOpcodes.h (+2)
(modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+22-18)
(modified) llvm/include/llvm/IR/Instructions.h (+9-1)
(modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (-8)
(modified) llvm/include/llvm/Support/TargetOpcodes.def (+3-1)
(modified) llvm/include/llvm/Target/GenericOpcodes.td (+2)
(modified) llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td (+2)
(modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+4)
(modified) llvm/lib/AsmParser/LLLexer.cpp (+2)
(modified) llvm/lib/AsmParser/LLParser.cpp (+6)
(modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+4)
(modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+4)
(modified) llvm/lib/CodeGen/AtomicExpandPass.cpp (+7-1)
(modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+6)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+9-15)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+6)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (+4)
(modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+2)
(modified) llvm/lib/IR/AutoUpgrade.cpp (+7-4)
(modified) llvm/lib/IR/Instructions.cpp (+4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUGISel.td (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+9)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructions.td (+6-4)
(modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+7-3)
(modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+9-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td (-6)
(modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+1-1)
(modified) llvm/lib/Target/AMDGPU/DSInstructions.td (+39-11)
(modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+7-16)
(modified) llvm/lib/Target/AMDGPU/R600ISelLowering.cpp (+2)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-20)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+2)
(modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+3-1)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+2)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+3-1)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2)
(modified) llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp (+2)
(modified) llvm/lib/Transforms/Utils/LowerAtomic.cpp (+11)
(modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/atomics-gmir.mir (+6)
(modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/atomics.ll (-57)
(modified) llvm/test/Assembler/atomic.ll (+10)
(modified) llvm/test/Bitcode/amdgcn-atomic.ll (+81)
(modified) llvm/test/Bitcode/compatibility.ll (+28)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+6)
(added) llvm/test/CodeGen/AArch64/atomicrmw-cond-sub-clamp.ll (+142)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_cond_sub.ll (+109)
(renamed) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_sub_clamp.ll (+75-32)
(modified) llvm/test/CodeGen/AMDGPU/atomics_cond_sub.ll (+43-21)
(modified) llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll (+6-4)
(modified) llvm/test/CodeGen/AMDGPU/global-saddr-atomics.gfx1030.ll (+14-4)
(removed) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.cond.sub.ll (-219)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.atomic.csub.ll (+4-6)
(modified) llvm/test/CodeGen/AMDGPU/private-memory-atomics.ll (+52)
(modified) llvm/test/CodeGen/AMDGPU/shl_add_ptr_csub.ll (+1-1)
(added) llvm/test/CodeGen/ARM/atomicrmw-cond-sub-clamp.ll (+186)
(added) llvm/test/CodeGen/Hexagon/atomicrmw-cond-sub-clamp.ll (+355)
(added) llvm/test/CodeGen/LoongArch/atomicrmw-cond-sub-clamp.ll (+362)
(added) llvm/test/CodeGen/PowerPC/atomicrmw-cond-sub-clamp.ll (+396)
(added) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1412)
(added) llvm/test/CodeGen/SPARC/atomicrmw-cond-sub-clamp.ll (+326)
(added) llvm/test/CodeGen/VE/Scalar/atomicrmw-cond-sub-clamp.ll (+240)
(added) llvm/test/CodeGen/WebAssembly/atomicrmw-cond-sub-clamp.ll (+355)
(added) llvm/test/CodeGen/X86/atomicrmw-cond-sub-clamp.ll (+153)
(modified) llvm/test/TableGen/GlobalISelCombinerEmitter/match-table-cxx.td (+19-19)
(modified) llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td (+31-31)
(modified) llvm/test/TableGen/GlobalISelEmitter.td (+1-1)
(modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-i16.ll (+358)
(modified) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-i8.ll (+798)
(modified) mlir/include/mlir/Dialect/LLVMIR/LLVMEnums.td (+5-1)
(modified) mlir/test/Target/LLVMIR/Import/instructions.ll (+5-1)
(modified) mlir/test/Target/LLVMIR/llvmir.mlir (+5-1)

diff --git a/llvm/bindings/ocaml/llvm/llvm.ml b/llvm/bindings/ocaml/llvm/llvm.ml
index 86b010e0ac22d..ae42b1eea93d6 100644
--- a/llvm/bindings/ocaml/llvm/llvm.ml
+++ b/llvm/bindings/ocaml/llvm/llvm.ml
@@ -296,6 +296,12 @@ module AtomicRMWBinOp = struct
   | UMin
   | FAdd
   | FSub
+  | FMax
+  | FMin
+  | UInc_Wrap
+  | UDec_Wrap
+  | Cond_Sub
+  | Sub_Clamp
 end
 
 module ValueKind = struct
diff --git a/llvm/bindings/ocaml/llvm/llvm.mli b/llvm/bindings/ocaml/llvm/llvm.mli
index c16530d3a70cb..9a6ed2ae80043 100644
--- a/llvm/bindings/ocaml/llvm/llvm.mli
+++ b/llvm/bindings/ocaml/llvm/llvm.mli
@@ -331,6 +331,12 @@ module AtomicRMWBinOp : sig
   | UMin
   | FAdd
   | FSub
+  | FMax
+  | FMin
+  | UInc_Wrap
+  | UDec_Wrap
+  | Cond_Sub
+  | Sub_Clamp
 end
 
 (** The kind of an [llvalue], the result of [classify_value v].
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5a16457412d24..cf0ad1b578759 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1321,11 +1321,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
 
                                                    The iglp_opt strategy implementations are subject to change.
 
-  llvm.amdgcn.atomic.cond.sub.u32                  Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
-                                                   and ds_cond_sub_u32 based on address space on gfx12 targets. This
-                                                   performs subtraction only if the memory value is greater than or
-                                                   equal to the data value.
-
   llvm.amdgcn.s.getpc                              Provides access to the s_getpc_b64 instruction, but with the return value
                                                    sign-extended from the width of the underlying PC hardware register even on
                                                    processors where the s_getpc_b64 instruction returns a zero-extended value.
diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst
index 42f56348885b4..67bd134174644 100644
--- a/llvm/docs/GlobalISel/GenericOpcode.rst
+++ b/llvm/docs/GlobalISel/GenericOpcode.rst
@@ -825,7 +825,9 @@ operands.
                                G_ATOMICRMW_MIN, G_ATOMICRMW_UMAX,
                                G_ATOMICRMW_UMIN, G_ATOMICRMW_FADD,
                                G_ATOMICRMW_FSUB, G_ATOMICRMW_FMAX,
-                               G_ATOMICRMW_FMIN
+                               G_ATOMICRMW_FMIN, G_ATOMICRMW_UINC_WRAP,
+			       G_ATOMICRMW_UDEC_WRAP, G_ATOMICRMW_COND_SUB,
+			       G_ATOMICRMW_SUB_CLAMP
 
 Generic atomicrmw. Expects a MachineMemOperand in addition to explicit
 operands.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index edb362c617565..ed76bd454002a 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11209,6 +11209,8 @@ operation. The operation must be one of the following keywords:
 -  fmin
 -  uinc_wrap
 -  udec_wrap
+-  cond_sub
+-  sub_clamp
 
 For most of these operations, the type of '<value>' must be an integer
 type whose bit width is a power of two greater than or equal to eight
@@ -11259,6 +11261,8 @@ operation argument:
 -  fmin: ``*ptr = minnum(*ptr, val)`` (match the `llvm.minnum.*`` intrinsic)
 -  uinc_wrap: ``*ptr = (*ptr u>= val) ? 0 : (*ptr + 1)`` (increment value with wraparound to zero when incremented above input value)
 -  udec_wrap: ``*ptr = ((*ptr == 0) || (*ptr u> val)) ? val : (*ptr - 1)`` (decrement with wraparound to input value when decremented below zero).
+-  cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).
+-  sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).
 
 
 Example:
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 76356dd76f1d2..a6162127b3b74 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -80,6 +80,8 @@ Changes to the LLVM IR
     removed. The next argument has been changed from byte index to bit
     index.
 
+* Added ``cond_sub`` and ``sub_clamp`` operations to ``atomicrmw``.
+
 Changes to LLVM infrastructure
 ------------------------------
 
@@ -132,6 +134,10 @@ Changes to the AMDGPU Backend
 
 * Implemented :ref:`llvm.get.rounding <int_get_rounding>` and :ref:`llvm.set.rounding <int_set_rounding>`
 
+* Removed ``llvm.amdgcn.atomic.cond.sub.u32`` and
+  ``llvm.amdgcn.atomic.csub.u32`` intrinsics. :ref:`atomicrmw <i_atomicrmw>`
+  should be used instead with ``cond_sub`` and ``sub_clamp``.
+
 Changes to the ARM Backend
 --------------------------
 
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index db6780b70ca5a..8ee04f25095f2 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -268,6 +268,8 @@ enum Kind {
   kw_fmin,
   kw_uinc_wrap,
   kw_udec_wrap,
+  kw_cond_sub,
+  kw_sub_clamp,
 
   // Instruction Opcodes (Opcode in UIntVal).
   kw_fneg,
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 5b5e08b5cbc3f..20980695499e6 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -484,7 +484,9 @@ enum RMWOperations {
   RMW_FMAX = 13,
   RMW_FMIN = 14,
   RMW_UINC_WRAP = 15,
-  RMW_UDEC_WRAP = 16
+  RMW_UDEC_WRAP = 16,
+  RMW_COND_SUB = 17,
+  RMW_SUB_CLAMP = 18
 };
 
 /// OverflowingBinaryOperatorOptionalFlags - Flags for serializing
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 6bb89fb58a296..21ac93a3b4b9b 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1292,6 +1292,8 @@ enum NodeType {
   ATOMIC_LOAD_FMIN,
   ATOMIC_LOAD_UINC_WRAP,
   ATOMIC_LOAD_UDEC_WRAP,
+  ATOMIC_LOAD_COND_SUB,
+  ATOMIC_LOAD_SUB_CLAMP,
 
   // Masked load and store - consecutive vector load and store operations
   // with additional mask operand that prevents memory accesses to the
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 2f36c2e86b1c3..1f3dd4ac1eda6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1470,6 +1470,8 @@ class MemSDNode : public SDNode {
     case ISD::ATOMIC_LOAD_FMIN:
     case ISD::ATOMIC_LOAD_UINC_WRAP:
     case ISD::ATOMIC_LOAD_UDEC_WRAP:
+    case ISD::ATOMIC_LOAD_COND_SUB:
+    case ISD::ATOMIC_LOAD_SUB_CLAMP:
     case ISD::ATOMIC_LOAD:
     case ISD::ATOMIC_STORE:
     case ISD::MLOAD:
@@ -1536,27 +1538,29 @@ class AtomicSDNode : public MemSDNode {
 
   // Methods to support isa and dyn_cast
   static bool classof(const SDNode *N) {
-    return N->getOpcode() == ISD::ATOMIC_CMP_SWAP     ||
+    return N->getOpcode() == ISD::ATOMIC_CMP_SWAP ||
            N->getOpcode() == ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS ||
-           N->getOpcode() == ISD::ATOMIC_SWAP         ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_ADD     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_SUB     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_AND     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_CLR     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_OR      ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_XOR     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_NAND    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_MIN     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_MAX     ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_UMIN    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_UMAX    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FADD    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FSUB    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FMAX    ||
-           N->getOpcode() == ISD::ATOMIC_LOAD_FMIN    ||
+           N->getOpcode() == ISD::ATOMIC_SWAP ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_ADD ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_SUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_AND ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_CLR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_OR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_XOR ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_NAND ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_MIN ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_MAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_UMIN ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_UMAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FADD ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FSUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FMAX ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_FMIN ||
            N->getOpcode() == ISD::ATOMIC_LOAD_UINC_WRAP ||
            N->getOpcode() == ISD::ATOMIC_LOAD_UDEC_WRAP ||
-           N->getOpcode() == ISD::ATOMIC_LOAD         ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_COND_SUB ||
+           N->getOpcode() == ISD::ATOMIC_LOAD_SUB_CLAMP ||
+           N->getOpcode() == ISD::ATOMIC_LOAD ||
            N->getOpcode() == ISD::ATOMIC_STORE;
   }
 };
diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h
index ab58edd1bf78c..ab5c20758abde 100644
--- a/llvm/include/llvm/IR/Instructions.h
+++ b/llvm/include/llvm/IR/Instructions.h
@@ -750,8 +750,16 @@ class AtomicRMWInst : public Instruction {
     /// *p = ((old == 0) || (old u> v)) ? v : (old - 1)
     UDecWrap,
 
+    /// Subtract only if result would be positive.
+    /// *p = (old u>= v) ? old - v : old
+    CondSub,
+
+    /// Subtract with clamping of negative results to zero.
+    /// *p = (old u>= v) ? old - v : 0
+    SubClamp,
+
     FIRST_BINOP = Xchg,
-    LAST_BINOP = UDecWrap,
+    LAST_BINOP = SubClamp,
     BAD_BINOP
   };
 
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 7a5e919fe26e3..5a3c80ebaa6e2 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -1284,7 +1284,6 @@ def int_amdgcn_raw_buffer_atomic_or : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_xor : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_inc : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_dec : AMDGPURawBufferAtomic;
-def int_amdgcn_raw_buffer_atomic_cond_sub_u32 : AMDGPURawBufferAtomic;
 def int_amdgcn_raw_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1321,7 +1320,6 @@ def int_amdgcn_raw_ptr_buffer_atomic_or : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_xor : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_inc : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_dec : AMDGPURawPtrBufferAtomic;
-def int_amdgcn_raw_ptr_buffer_atomic_cond_sub_u32 : AMDGPURawPtrBufferAtomic;
 def int_amdgcn_raw_ptr_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1362,7 +1360,6 @@ def int_amdgcn_struct_buffer_atomic_or : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_xor : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_inc : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_dec : AMDGPUStructBufferAtomic;
-def int_amdgcn_struct_buffer_atomic_cond_sub_u32 : AMDGPUStructBufferAtomic;
 def int_amdgcn_struct_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -1398,7 +1395,6 @@ def int_amdgcn_struct_ptr_buffer_atomic_or : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_xor : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_inc : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_dec : AMDGPUStructPtrBufferAtomic;
-def int_amdgcn_struct_ptr_buffer_atomic_cond_sub_u32 : AMDGPUStructPtrBufferAtomic;
 def int_amdgcn_struct_ptr_buffer_atomic_cmpswap : Intrinsic<
   [llvm_anyint_ty],
   [LLVMMatchType<0>,  // src(VGPR)
@@ -2392,8 +2388,6 @@ class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
   [IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree], "",
   [SDNPMemOperand]>;
 
-def int_amdgcn_global_atomic_csub : AMDGPUAtomicRtn<llvm_i32_ty>;
-
 // uint4 llvm.amdgcn.image.bvh.intersect.ray <node_ptr>, <ray_extent>, <ray_origin>,
 //                                           <ray_dir>, <ray_inv_dir>, <texture_descr>
 // <node_ptr> is i32 or i64.
@@ -2594,8 +2588,6 @@ def int_amdgcn_flat_atomic_fmax_num   : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
 
-def int_amdgcn_atomic_cond_sub_u32 : AMDGPUAtomicRtn<llvm_i32_ty>;
-
 class AMDGPULoadIntrinsic<LLVMType ptr_ty>:
   Intrinsic<
     [llvm_any_ty],
diff --git a/llvm/include/llvm/Support/TargetOpcodes.def b/llvm/include/llvm/Support/TargetOpcodes.def
index df4b264af72a8..75bc350fa8d6b 100644
--- a/llvm/include/llvm/Support/TargetOpcodes.def
+++ b/llvm/include/llvm/Support/TargetOpcodes.def
@@ -411,12 +411,14 @@ HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMAX)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMIN)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_UINC_WRAP)
 HANDLE_TARGET_OPCODE(G_ATOMICRMW_UDEC_WRAP)
+HANDLE_TARGET_OPCODE(G_ATOMICRMW_COND_SUB)
+HANDLE_TARGET_OPCODE(G_ATOMICRMW_SUB_CLAMP)
 
 // Marker for start of Generic AtomicRMW opcodes
 HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_START, G_ATOMICRMW_XCHG)
 
 // Marker for end of Generic AtomicRMW opcodes
-HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_UDEC_WRAP)
+HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_SUB_CLAMP)
 
 // Generic atomic fence
 HANDLE_TARGET_OPCODE(G_FENCE)
diff --git a/llvm/include/llvm/Target/GenericOpcodes.td b/llvm/include/llvm/Target/GenericOpcodes.td
index 4abffe6476c85..1691e83eae377 100644
--- a/llvm/include/llvm/Target/GenericOpcodes.td
+++ b/llvm/include/llvm/Target/GenericOpcodes.td
@@ -1291,6 +1291,8 @@ def G_ATOMICRMW_FMAX : G_ATOMICRMW_OP;
 def G_ATOMICRMW_FMIN : G_ATOMICRMW_OP;
 def G_ATOMICRMW_UINC_WRAP : G_ATOMICRMW_OP;
 def G_ATOMICRMW_UDEC_WRAP : G_ATOMICRMW_OP;
+def G_ATOMICRMW_COND_SUB : G_ATOMICRMW_OP;
+def G_ATOMICRMW_SUB_CLAMP : G_ATOMICRMW_OP;
 
 def G_FENCE : GenericInstruction {
   let OutOperandList = (outs);
diff --git a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
index 560d3b434d07d..43d4e8d37e9b0 100644
--- a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
+++ b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
@@ -252,6 +252,8 @@ def : GINodeEquiv<G_ATOMICRMW_FMAX, atomic_load_fmax>;
 def : GINodeEquiv<G_ATOMICRMW_FMIN, atomic_load_fmin>;
 def : GINodeEquiv<G_ATOMICRMW_UINC_WRAP, atomic_load_uinc_wrap>;
 def : GINodeEquiv<G_ATOMICRMW_UDEC_WRAP, atomic_load_udec_wrap>;
+def : GINodeEquiv<G_ATOMICRMW_COND_SUB, atomic_load_cond_sub>;
+def : GINodeEquiv<G_ATOMICRMW_SUB_CLAMP, atomic_load_sub_clamp>;
 def : GINodeEquiv<G_FENCE, atomic_fence>;
 def : GINodeEquiv<G_PREFETCH, prefetch>;
 def : GINodeEquiv<G_TRAP, trap>;
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 8cbf98cd58ca9..ac6cfd823eb44 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -722,6 +722,10 @@ def atomic_load_uinc_wrap : SDNode<"ISD::ATOMIC_LOAD_UINC_WRAP", SDTAtomic2,
                     [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
 def atomic_load_udec_wrap : SDNode<"ISD::ATOMIC_LOAD_UDEC_WRAP", SDTAtomic2,
                     [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
+def atomic_load_cond_sub : SDNode<"ISD::ATOMIC_LOAD_COND_SUB", SDTAtomic2,
+                    [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
+def atomic_load_sub_clamp : SDNode<"ISD::ATOMIC_LOAD_SUB_CLAMP", SDTAtomic2,
+                    [SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
 
 def atomic_load      : SDNode<"ISD::ATOMIC_LOAD", SDTAtomicLoad,
                     [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 7d7fe19568e8a..cbd039bf98c44 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -704,6 +704,8 @@ lltok::Kind LLLexer::LexIdentifier() {
   KEYWORD(umin); KEYWORD(fmax); KEYWORD(fmin);
   KEYWORD(uinc_wrap);
   KEYWORD(udec_wrap);
+  KEYWORD(cond_sub);
+  KEYWORD(sub_clamp);
 
   KEYWORD(splat);
   KEYWORD(vscale);
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 21d386097fc63..2d086c859c14c 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -8331,6 +8331,12 @@ int LLParser::parseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) {
   case lltok::kw_udec_wrap:
     Operation = AtomicRMWInst::UDecWrap;
     break;
+  case lltok::kw_cond_sub:
+    Operation = AtomicRMWInst::CondSub;
+    break;
+  case lltok::kw_sub_clamp:
+    Operation = AtomicRMWInst::SubClamp;
+    break;
   case lltok::kw_fadd:
     Operation = AtomicRMWInst::FAdd;
     IsFP = true;
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 05c9697123371..eea10e9221b53 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1349,6 +1349,10 @@ static AtomicRMWInst::BinOp getDecodedRMWOperation(unsigned Val) {
     return AtomicRMWInst::UIncWrap;
   case bitc::RMW_UDEC_WRAP:
     return AtomicRMWInst::UDecWrap;
+  case bitc::RMW_COND_SUB:
+    return AtomicRMWInst::CondSub;
+  case bitc::RMW_SUB_CLAMP:
+    return AtomicRMWInst::SubClamp;
   }
 }
 
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index ba16c0851e1fd..12002803ca54e 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -658,6 +658,10 @@ static unsigned getEncodedRMWOperation(AtomicRMWInst::BinOp Op) {
     return bitc::RMW_UINC_WRAP;
   case AtomicRMWInst::UDecWrap:
     return bitc::RMW_UDEC_WRAP;
+  case AtomicRMWInst::CondSub:
+    return bitc::RMW_COND_SUB;
+  case AtomicRMWInst::SubClamp:
+    return bitc::RMW_SUB_CLAMP;
   }
 }
 
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index 7728cc50fc9f9..c54875e4bc0a2 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -868,7 +868,9 @@ static Value *performMaskedAtomicOp(AtomicRMWInst::BinOp Op,
   case AtomicRMWInst::FMin:
   case AtomicRMWInst::FMax:
   case AtomicRMWInst::UIncWrap:
-  case AtomicRMWInst::UDecWrap: {
+  case AtomicRMWInst::UDecWrap:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp: {
     // Finally, other ops will operate on the full value, so truncate down to
     // the original size, and expand out again after doing the
     // operation. Bitcasts will be inserted for FP values.
@@ -1542,6 +1544,8 @@ bool AtomicExpandImpl::isIdempotentRMW(AtomicRMWInst *RMWI) {
   case AtomicRMWInst::Sub:
   case AtomicRMWInst::Or:
   case AtomicRMWInst::Xor:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp:
     return C->isZero();
   case AtomicRMWInst::And:
     return C->isMinusOne();
@@ -1783,6 +1787,8 @@ static ArrayRef<RTLIB::Libcall> GetRMWLibcall(AtomicRMWInst::BinOp Op) {
   case AtomicRMWInst::FSub:
   case AtomicRMWInst::UIncWrap:
   case AtomicRMWInst::UDecWrap:
+  case AtomicRMWInst::CondSub:
+  case AtomicRMWInst::SubClamp:
     // No atomic libcalls are available for max/min/umax/umin.
     return {};
   }
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index c06b35a98e434..452d8a1599636 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -3289,6 +3289,12 @@ bool IRTranslator::translateAtomicRMW(const User &U,
   case AtomicRMWInst::UDecWrap:
     Opcode = TargetOpcode::G_ATOMICRMW_UDEC_WRAP;
     break;
+  case AtomicRMWInst::CondSub:
+    Opcode = TargetOpcode::G_ATOMICRMW_COND_SUB;
+    b...
[truncated]

tschuett · 2024-06-25T16:07:41Z

Could you please extend
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h

Thanks.

jayfoad · 2024-06-25T16:08:28Z

llvm/docs/LangRef.rst

+-  cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).
+-  sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).


I find "positive" and "negative" confusing here. Suggestion:

Suggested change

- cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).

- sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).

- cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if no unsigned overflow).

- sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping to zero).

It looks like sub_clamp does the operation of the usub.sat intrinsic, so maybe call it usub_sat?

And maybe cond_sub should be usub_cond.

RKSimon · 2024-06-25T16:32:29Z

llvm/test/CodeGen/X86/atomicrmw-cond-sub-clamp.ll

@@ -0,0 +1,153 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple x86_64-pc-linux < %s | FileCheck %s


Please 32-bit test coverage as well

tschuett · 2024-06-25T18:55:48Z

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td

-defm int_amdgcn_atomic_cond_sub_u32 : local_addr_space_atomic_op;
-defm int_amdgcn_atomic_cond_sub_u32 : flat_addr_space_atomic_op;
-defm int_amdgcn_atomic_cond_sub_u32 : global_addr_space_atomic_op;
+//defm int_amdgcn_atomic_cond_sub_u32 : local_addr_space_atomic_op;


arsenm · 2024-06-25T19:59:32Z

llvm/docs/ReleaseNotes.rst

@@ -132,6 +134,10 @@ Changes to the AMDGPU Backend

 * Implemented :ref:`llvm.get.rounding <int_get_rounding>` and :ref:`llvm.set.rounding <int_set_rounding>`

+* Removed ``llvm.amdgcn.atomic.cond.sub.u32`` and


Best to do this in a separate change. Best to just introduce the new operation here

ping, would be best to split all of the AMDGPU implementation pieces into a separate pR

arsenm · 2024-08-01T12:48:10Z

llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_cond_sub.ll

@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -verify-machineinstrs < %s | FileCheck %s -check-prefix=GFX12


Drop -verify-machineinstrs. Also just move up to main test directory, and test sdag and globalisel in the same file

arsenm · 2024-08-01T12:50:41Z

llvm/test/Bitcode/amdgcn-atomic.ll

+
+declare i32 @llvm.amdgcn.atomic.csub.i32.p1(ptr addrspace(1) nocapture, i32, i32 immarg, i32 immarg, i1 immarg) #0
+declare i32 @llvm.amdgcn.atomic.csub.i32.p3(ptr addrspace(3) nocapture, i32, i32 immarg, i32 immarg, i1 immarg) #0
+declare i32 @llvm.amdgcn.atomic.csub.i32.p0(ptr nocapture, i32, i32 immarg, i32 immarg, i1 immarg) #0


Check an i64 case is handled correctly

anjenner · 2024-08-21T17:07:33Z

I have created a new pull request, #105553 , which addresses all the mentioned issues including splitting the commit into two.

anjenner requested a review from nikic as a code owner June 25, 2024 16:01

shiltian requested a review from arsenm June 25, 2024 16:05

jayfoad reviewed Jun 25, 2024

View reviewed changes

RKSimon reviewed Jun 25, 2024

View reviewed changes

tschuett reviewed Jun 25, 2024

View reviewed changes

arsenm reviewed Jun 25, 2024

View reviewed changes

llvmbot added the backend:Hexagon label Jul 31, 2024

Feedback from pull request review.

7815de7

arsenm reviewed Aug 1, 2024

View reviewed changes

anjenner closed this Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cond_sub and sub_clamp operations to atomicrmw #96661

Add cond_sub and sub_clamp operations to atomicrmw #96661

Uh oh!

anjenner commented Jun 25, 2024

Uh oh!

github-actions bot commented Jun 25, 2024

Uh oh!

llvmbot commented Jun 25, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jun 25, 2024

Uh oh!

tschuett commented Jun 25, 2024

Uh oh!

jayfoad Jun 25, 2024

Uh oh!

nikic Jun 26, 2024 •

edited

Loading

Uh oh!

nikic Jun 26, 2024

Uh oh!

RKSimon Jun 25, 2024

Uh oh!

tschuett Jun 25, 2024

Uh oh!

arsenm Jun 25, 2024

Uh oh!

arsenm Aug 1, 2024

Uh oh!

arsenm Aug 1, 2024

Uh oh!

arsenm Aug 1, 2024

Uh oh!

anjenner commented Aug 21, 2024

Uh oh!

Uh oh!

		- cond_sub: ``ptr = (ptr u>= val) ? ptr - val : ptr`` (subtract only if result would be positive).
		- sub_clamp: ``ptr = (ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).

		@@ -0,0 +1,153 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
		; RUN: llc -mtriple x86_64-pc-linux < %s \| FileCheck %s

		@@ -132,6 +134,10 @@ Changes to the AMDGPU Backend

		* Implemented :ref:`llvm.get.rounding <int_get_rounding>` and :ref:`llvm.set.rounding <int_set_rounding>`

		* Removed ``llvm.amdgcn.atomic.cond.sub.u32`` and

		@@ -0,0 +1,109 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
		; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=GFX12

Add cond_sub and sub_clamp operations to atomicrmw #96661

Add cond_sub and sub_clamp operations to atomicrmw #96661

Uh oh!

Conversation

anjenner commented Jun 25, 2024

Uh oh!

github-actions bot commented Jun 25, 2024

Uh oh!

llvmbot commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 25, 2024

Uh oh!

tschuett commented Jun 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anjenner commented Aug 21, 2024

Uh oh!

Uh oh!

llvmbot commented Jun 25, 2024 •

edited

Loading

nikic Jun 26, 2024 •

edited

Loading