Skip to content

Add cond_sub and sub_clamp operations to atomicrmw #96661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions llvm/bindings/ocaml/llvm/llvm.ml
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,12 @@ module AtomicRMWBinOp = struct
| UMin
| FAdd
| FSub
| FMax
| FMin
| UInc_Wrap
| UDec_Wrap
| Cond_Sub
| Sub_Clamp
end

module ValueKind = struct
Expand Down
6 changes: 6 additions & 0 deletions llvm/bindings/ocaml/llvm/llvm.mli
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,12 @@ module AtomicRMWBinOp : sig
| UMin
| FAdd
| FSub
| FMax
| FMin
| UInc_Wrap
| UDec_Wrap
| Cond_Sub
| Sub_Clamp
end

(** The kind of an [llvalue], the result of [classify_value v].
Expand Down
5 changes: 0 additions & 5 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1321,11 +1321,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.

The iglp_opt strategy implementations are subject to change.

llvm.amdgcn.atomic.cond.sub.u32 Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
and ds_cond_sub_u32 based on address space on gfx12 targets. This
performs subtraction only if the memory value is greater than or
equal to the data value.

llvm.amdgcn.s.getpc Provides access to the s_getpc_b64 instruction, but with the return value
sign-extended from the width of the underlying PC hardware register even on
processors where the s_getpc_b64 instruction returns a zero-extended value.
Expand Down
4 changes: 3 additions & 1 deletion llvm/docs/GlobalISel/GenericOpcode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -825,7 +825,9 @@ operands.
G_ATOMICRMW_MIN, G_ATOMICRMW_UMAX,
G_ATOMICRMW_UMIN, G_ATOMICRMW_FADD,
G_ATOMICRMW_FSUB, G_ATOMICRMW_FMAX,
G_ATOMICRMW_FMIN
G_ATOMICRMW_FMIN, G_ATOMICRMW_UINC_WRAP,
G_ATOMICRMW_UDEC_WRAP, G_ATOMICRMW_COND_SUB,
G_ATOMICRMW_SUB_CLAMP

Generic atomicrmw. Expects a MachineMemOperand in addition to explicit
operands.
Expand Down
4 changes: 4 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11209,6 +11209,8 @@ operation. The operation must be one of the following keywords:
- fmin
- uinc_wrap
- udec_wrap
- cond_sub
- sub_clamp

For most of these operations, the type of '<value>' must be an integer
type whose bit width is a power of two greater than or equal to eight
Expand Down Expand Up @@ -11259,6 +11261,8 @@ operation argument:
- fmin: ``*ptr = minnum(*ptr, val)`` (match the `llvm.minnum.*`` intrinsic)
- uinc_wrap: ``*ptr = (*ptr u>= val) ? 0 : (*ptr + 1)`` (increment value with wraparound to zero when incremented above input value)
- udec_wrap: ``*ptr = ((*ptr == 0) || (*ptr u> val)) ? val : (*ptr - 1)`` (decrement with wraparound to input value when decremented below zero).
- cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).
- sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "positive" and "negative" confusing here. Suggestion:

Suggested change
- cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if result would be positive).
- sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping of negative results to zero).
- cond_sub: ``*ptr = (*ptr u>= val) ? *ptr - val : *ptr`` (subtract only if no unsigned overflow).
- sub_clamp: ``*ptr = (*ptr u>= val) ? *ptr - val : 0`` (subtract with clamping to zero).

Copy link
Contributor

@nikic nikic Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like sub_clamp does the operation of the usub.sat intrinsic, so maybe call it usub_sat?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe cond_sub should be usub_cond.



Example:
Expand Down
6 changes: 6 additions & 0 deletions llvm/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ Changes to the LLVM IR
removed. The next argument has been changed from byte index to bit
index.

* Added ``cond_sub`` and ``sub_clamp`` operations to ``atomicrmw``.

Changes to LLVM infrastructure
------------------------------

Expand Down Expand Up @@ -132,6 +134,10 @@ Changes to the AMDGPU Backend

* Implemented :ref:`llvm.get.rounding <int_get_rounding>` and :ref:`llvm.set.rounding <int_set_rounding>`

* Removed ``llvm.amdgcn.atomic.cond.sub.u32`` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to do this in a separate change. Best to just introduce the new operation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping, would be best to split all of the AMDGPU implementation pieces into a separate pR

``llvm.amdgcn.atomic.csub.u32`` intrinsics. :ref:`atomicrmw <i_atomicrmw>`
should be used instead with ``cond_sub`` and ``sub_clamp``.

Changes to the ARM Backend
--------------------------

Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/AsmParser/LLToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,8 @@ enum Kind {
kw_fmin,
kw_uinc_wrap,
kw_udec_wrap,
kw_cond_sub,
kw_sub_clamp,

// Instruction Opcodes (Opcode in UIntVal).
kw_fneg,
Expand Down
4 changes: 3 additions & 1 deletion llvm/include/llvm/Bitcode/LLVMBitCodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,9 @@ enum RMWOperations {
RMW_FMAX = 13,
RMW_FMIN = 14,
RMW_UINC_WRAP = 15,
RMW_UDEC_WRAP = 16
RMW_UDEC_WRAP = 16,
RMW_COND_SUB = 17,
RMW_SUB_CLAMP = 18
};

/// OverflowingBinaryOperatorOptionalFlags - Flags for serializing
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1292,6 +1292,8 @@ enum NodeType {
ATOMIC_LOAD_FMIN,
ATOMIC_LOAD_UINC_WRAP,
ATOMIC_LOAD_UDEC_WRAP,
ATOMIC_LOAD_COND_SUB,
ATOMIC_LOAD_SUB_CLAMP,

// Masked load and store - consecutive vector load and store operations
// with additional mask operand that prevents memory accesses to the
Expand Down
40 changes: 22 additions & 18 deletions llvm/include/llvm/CodeGen/SelectionDAGNodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1470,6 +1470,8 @@ class MemSDNode : public SDNode {
case ISD::ATOMIC_LOAD_FMIN:
case ISD::ATOMIC_LOAD_UINC_WRAP:
case ISD::ATOMIC_LOAD_UDEC_WRAP:
case ISD::ATOMIC_LOAD_COND_SUB:
case ISD::ATOMIC_LOAD_SUB_CLAMP:
case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE:
case ISD::MLOAD:
Expand Down Expand Up @@ -1536,27 +1538,29 @@ class AtomicSDNode : public MemSDNode {

// Methods to support isa and dyn_cast
static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::ATOMIC_CMP_SWAP ||
return N->getOpcode() == ISD::ATOMIC_CMP_SWAP ||
N->getOpcode() == ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS ||
N->getOpcode() == ISD::ATOMIC_SWAP ||
N->getOpcode() == ISD::ATOMIC_LOAD_ADD ||
N->getOpcode() == ISD::ATOMIC_LOAD_SUB ||
N->getOpcode() == ISD::ATOMIC_LOAD_AND ||
N->getOpcode() == ISD::ATOMIC_LOAD_CLR ||
N->getOpcode() == ISD::ATOMIC_LOAD_OR ||
N->getOpcode() == ISD::ATOMIC_LOAD_XOR ||
N->getOpcode() == ISD::ATOMIC_LOAD_NAND ||
N->getOpcode() == ISD::ATOMIC_LOAD_MIN ||
N->getOpcode() == ISD::ATOMIC_LOAD_MAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_UMIN ||
N->getOpcode() == ISD::ATOMIC_LOAD_UMAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_FADD ||
N->getOpcode() == ISD::ATOMIC_LOAD_FSUB ||
N->getOpcode() == ISD::ATOMIC_LOAD_FMAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_FMIN ||
N->getOpcode() == ISD::ATOMIC_SWAP ||
N->getOpcode() == ISD::ATOMIC_LOAD_ADD ||
N->getOpcode() == ISD::ATOMIC_LOAD_SUB ||
N->getOpcode() == ISD::ATOMIC_LOAD_AND ||
N->getOpcode() == ISD::ATOMIC_LOAD_CLR ||
N->getOpcode() == ISD::ATOMIC_LOAD_OR ||
N->getOpcode() == ISD::ATOMIC_LOAD_XOR ||
N->getOpcode() == ISD::ATOMIC_LOAD_NAND ||
N->getOpcode() == ISD::ATOMIC_LOAD_MIN ||
N->getOpcode() == ISD::ATOMIC_LOAD_MAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_UMIN ||
N->getOpcode() == ISD::ATOMIC_LOAD_UMAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_FADD ||
N->getOpcode() == ISD::ATOMIC_LOAD_FSUB ||
N->getOpcode() == ISD::ATOMIC_LOAD_FMAX ||
N->getOpcode() == ISD::ATOMIC_LOAD_FMIN ||
N->getOpcode() == ISD::ATOMIC_LOAD_UINC_WRAP ||
N->getOpcode() == ISD::ATOMIC_LOAD_UDEC_WRAP ||
N->getOpcode() == ISD::ATOMIC_LOAD ||
N->getOpcode() == ISD::ATOMIC_LOAD_COND_SUB ||
N->getOpcode() == ISD::ATOMIC_LOAD_SUB_CLAMP ||
N->getOpcode() == ISD::ATOMIC_LOAD ||
N->getOpcode() == ISD::ATOMIC_STORE;
}
};
Expand Down
10 changes: 9 additions & 1 deletion llvm/include/llvm/IR/Instructions.h
Original file line number Diff line number Diff line change
Expand Up @@ -750,8 +750,16 @@ class AtomicRMWInst : public Instruction {
/// *p = ((old == 0) || (old u> v)) ? v : (old - 1)
UDecWrap,

/// Subtract only if result would be positive.
/// *p = (old u>= v) ? old - v : old
CondSub,

/// Subtract with clamping of negative results to zero.
/// *p = (old u>= v) ? old - v : 0
SubClamp,

FIRST_BINOP = Xchg,
LAST_BINOP = UDecWrap,
LAST_BINOP = SubClamp,
BAD_BINOP
};

Expand Down
8 changes: 0 additions & 8 deletions llvm/include/llvm/IR/IntrinsicsAMDGPU.td
Original file line number Diff line number Diff line change
Expand Up @@ -1284,7 +1284,6 @@ def int_amdgcn_raw_buffer_atomic_or : AMDGPURawBufferAtomic;
def int_amdgcn_raw_buffer_atomic_xor : AMDGPURawBufferAtomic;
def int_amdgcn_raw_buffer_atomic_inc : AMDGPURawBufferAtomic;
def int_amdgcn_raw_buffer_atomic_dec : AMDGPURawBufferAtomic;
def int_amdgcn_raw_buffer_atomic_cond_sub_u32 : AMDGPURawBufferAtomic;
def int_amdgcn_raw_buffer_atomic_cmpswap : Intrinsic<
[llvm_anyint_ty],
[LLVMMatchType<0>, // src(VGPR)
Expand Down Expand Up @@ -1321,7 +1320,6 @@ def int_amdgcn_raw_ptr_buffer_atomic_or : AMDGPURawPtrBufferAtomic;
def int_amdgcn_raw_ptr_buffer_atomic_xor : AMDGPURawPtrBufferAtomic;
def int_amdgcn_raw_ptr_buffer_atomic_inc : AMDGPURawPtrBufferAtomic;
def int_amdgcn_raw_ptr_buffer_atomic_dec : AMDGPURawPtrBufferAtomic;
def int_amdgcn_raw_ptr_buffer_atomic_cond_sub_u32 : AMDGPURawPtrBufferAtomic;
def int_amdgcn_raw_ptr_buffer_atomic_cmpswap : Intrinsic<
[llvm_anyint_ty],
[LLVMMatchType<0>, // src(VGPR)
Expand Down Expand Up @@ -1362,7 +1360,6 @@ def int_amdgcn_struct_buffer_atomic_or : AMDGPUStructBufferAtomic;
def int_amdgcn_struct_buffer_atomic_xor : AMDGPUStructBufferAtomic;
def int_amdgcn_struct_buffer_atomic_inc : AMDGPUStructBufferAtomic;
def int_amdgcn_struct_buffer_atomic_dec : AMDGPUStructBufferAtomic;
def int_amdgcn_struct_buffer_atomic_cond_sub_u32 : AMDGPUStructBufferAtomic;
def int_amdgcn_struct_buffer_atomic_cmpswap : Intrinsic<
[llvm_anyint_ty],
[LLVMMatchType<0>, // src(VGPR)
Expand Down Expand Up @@ -1398,7 +1395,6 @@ def int_amdgcn_struct_ptr_buffer_atomic_or : AMDGPUStructPtrBufferAtomic;
def int_amdgcn_struct_ptr_buffer_atomic_xor : AMDGPUStructPtrBufferAtomic;
def int_amdgcn_struct_ptr_buffer_atomic_inc : AMDGPUStructPtrBufferAtomic;
def int_amdgcn_struct_ptr_buffer_atomic_dec : AMDGPUStructPtrBufferAtomic;
def int_amdgcn_struct_ptr_buffer_atomic_cond_sub_u32 : AMDGPUStructPtrBufferAtomic;
def int_amdgcn_struct_ptr_buffer_atomic_cmpswap : Intrinsic<
[llvm_anyint_ty],
[LLVMMatchType<0>, // src(VGPR)
Expand Down Expand Up @@ -2392,8 +2388,6 @@ class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree], "",
[SDNPMemOperand]>;

def int_amdgcn_global_atomic_csub : AMDGPUAtomicRtn<llvm_i32_ty>;

// uint4 llvm.amdgcn.image.bvh.intersect.ray <node_ptr>, <ray_extent>, <ray_origin>,
// <ray_dir>, <ray_inv_dir>, <texture_descr>
// <node_ptr> is i32 or i64.
Expand Down Expand Up @@ -2594,8 +2588,6 @@ def int_amdgcn_flat_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
def int_amdgcn_global_atomic_fmin_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;
def int_amdgcn_global_atomic_fmax_num : AMDGPUAtomicRtn<llvm_anyfloat_ty>;

def int_amdgcn_atomic_cond_sub_u32 : AMDGPUAtomicRtn<llvm_i32_ty>;

class AMDGPULoadIntrinsic<LLVMType ptr_ty>:
Intrinsic<
[llvm_any_ty],
Expand Down
4 changes: 3 additions & 1 deletion llvm/include/llvm/Support/TargetOpcodes.def
Original file line number Diff line number Diff line change
Expand Up @@ -411,12 +411,14 @@ HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMAX)
HANDLE_TARGET_OPCODE(G_ATOMICRMW_FMIN)
HANDLE_TARGET_OPCODE(G_ATOMICRMW_UINC_WRAP)
HANDLE_TARGET_OPCODE(G_ATOMICRMW_UDEC_WRAP)
HANDLE_TARGET_OPCODE(G_ATOMICRMW_COND_SUB)
HANDLE_TARGET_OPCODE(G_ATOMICRMW_SUB_CLAMP)

// Marker for start of Generic AtomicRMW opcodes
HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_START, G_ATOMICRMW_XCHG)

// Marker for end of Generic AtomicRMW opcodes
HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_UDEC_WRAP)
HANDLE_TARGET_OPCODE_MARKER(GENERIC_ATOMICRMW_OP_END, G_ATOMICRMW_SUB_CLAMP)

// Generic atomic fence
HANDLE_TARGET_OPCODE(G_FENCE)
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/Target/GenericOpcodes.td
Original file line number Diff line number Diff line change
Expand Up @@ -1291,6 +1291,8 @@ def G_ATOMICRMW_FMAX : G_ATOMICRMW_OP;
def G_ATOMICRMW_FMIN : G_ATOMICRMW_OP;
def G_ATOMICRMW_UINC_WRAP : G_ATOMICRMW_OP;
def G_ATOMICRMW_UDEC_WRAP : G_ATOMICRMW_OP;
def G_ATOMICRMW_COND_SUB : G_ATOMICRMW_OP;
def G_ATOMICRMW_SUB_CLAMP : G_ATOMICRMW_OP;

def G_FENCE : GenericInstruction {
let OutOperandList = (outs);
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,8 @@ def : GINodeEquiv<G_ATOMICRMW_FMAX, atomic_load_fmax>;
def : GINodeEquiv<G_ATOMICRMW_FMIN, atomic_load_fmin>;
def : GINodeEquiv<G_ATOMICRMW_UINC_WRAP, atomic_load_uinc_wrap>;
def : GINodeEquiv<G_ATOMICRMW_UDEC_WRAP, atomic_load_udec_wrap>;
def : GINodeEquiv<G_ATOMICRMW_COND_SUB, atomic_load_cond_sub>;
def : GINodeEquiv<G_ATOMICRMW_SUB_CLAMP, atomic_load_sub_clamp>;
def : GINodeEquiv<G_FENCE, atomic_fence>;
def : GINodeEquiv<G_PREFETCH, prefetch>;
def : GINodeEquiv<G_TRAP, trap>;
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Target/TargetSelectionDAG.td
Original file line number Diff line number Diff line change
Expand Up @@ -722,6 +722,10 @@ def atomic_load_uinc_wrap : SDNode<"ISD::ATOMIC_LOAD_UINC_WRAP", SDTAtomic2,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
def atomic_load_udec_wrap : SDNode<"ISD::ATOMIC_LOAD_UDEC_WRAP", SDTAtomic2,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
def atomic_load_cond_sub : SDNode<"ISD::ATOMIC_LOAD_COND_SUB", SDTAtomic2,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;
def atomic_load_sub_clamp : SDNode<"ISD::ATOMIC_LOAD_SUB_CLAMP", SDTAtomic2,
[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPMemOperand]>;

def atomic_load : SDNode<"ISD::ATOMIC_LOAD", SDTAtomicLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/AsmParser/LLLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -704,6 +704,8 @@ lltok::Kind LLLexer::LexIdentifier() {
KEYWORD(umin); KEYWORD(fmax); KEYWORD(fmin);
KEYWORD(uinc_wrap);
KEYWORD(udec_wrap);
KEYWORD(cond_sub);
KEYWORD(sub_clamp);

KEYWORD(splat);
KEYWORD(vscale);
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/AsmParser/LLParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8331,6 +8331,12 @@ int LLParser::parseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) {
case lltok::kw_udec_wrap:
Operation = AtomicRMWInst::UDecWrap;
break;
case lltok::kw_cond_sub:
Operation = AtomicRMWInst::CondSub;
break;
case lltok::kw_sub_clamp:
Operation = AtomicRMWInst::SubClamp;
break;
case lltok::kw_fadd:
Operation = AtomicRMWInst::FAdd;
IsFP = true;
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Bitcode/Reader/BitcodeReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1349,6 +1349,10 @@ static AtomicRMWInst::BinOp getDecodedRMWOperation(unsigned Val) {
return AtomicRMWInst::UIncWrap;
case bitc::RMW_UDEC_WRAP:
return AtomicRMWInst::UDecWrap;
case bitc::RMW_COND_SUB:
return AtomicRMWInst::CondSub;
case bitc::RMW_SUB_CLAMP:
return AtomicRMWInst::SubClamp;
}
}

Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -658,6 +658,10 @@ static unsigned getEncodedRMWOperation(AtomicRMWInst::BinOp Op) {
return bitc::RMW_UINC_WRAP;
case AtomicRMWInst::UDecWrap:
return bitc::RMW_UDEC_WRAP;
case AtomicRMWInst::CondSub:
return bitc::RMW_COND_SUB;
case AtomicRMWInst::SubClamp:
return bitc::RMW_SUB_CLAMP;
}
}

Expand Down
8 changes: 7 additions & 1 deletion llvm/lib/CodeGen/AtomicExpandPass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -868,7 +868,9 @@ static Value *performMaskedAtomicOp(AtomicRMWInst::BinOp Op,
case AtomicRMWInst::FMin:
case AtomicRMWInst::FMax:
case AtomicRMWInst::UIncWrap:
case AtomicRMWInst::UDecWrap: {
case AtomicRMWInst::UDecWrap:
case AtomicRMWInst::CondSub:
case AtomicRMWInst::SubClamp: {
// Finally, other ops will operate on the full value, so truncate down to
// the original size, and expand out again after doing the
// operation. Bitcasts will be inserted for FP values.
Expand Down Expand Up @@ -1542,6 +1544,8 @@ bool AtomicExpandImpl::isIdempotentRMW(AtomicRMWInst *RMWI) {
case AtomicRMWInst::Sub:
case AtomicRMWInst::Or:
case AtomicRMWInst::Xor:
case AtomicRMWInst::CondSub:
case AtomicRMWInst::SubClamp:
return C->isZero();
case AtomicRMWInst::And:
return C->isMinusOne();
Expand Down Expand Up @@ -1783,6 +1787,8 @@ static ArrayRef<RTLIB::Libcall> GetRMWLibcall(AtomicRMWInst::BinOp Op) {
case AtomicRMWInst::FSub:
case AtomicRMWInst::UIncWrap:
case AtomicRMWInst::UDecWrap:
case AtomicRMWInst::CondSub:
case AtomicRMWInst::SubClamp:
// No atomic libcalls are available for max/min/umax/umin.
return {};
}
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3289,6 +3289,12 @@ bool IRTranslator::translateAtomicRMW(const User &U,
case AtomicRMWInst::UDecWrap:
Opcode = TargetOpcode::G_ATOMICRMW_UDEC_WRAP;
break;
case AtomicRMWInst::CondSub:
Opcode = TargetOpcode::G_ATOMICRMW_COND_SUB;
break;
case AtomicRMWInst::SubClamp:
Opcode = TargetOpcode::G_ATOMICRMW_SUB_CLAMP;
break;
}

MIRBuilder.buildAtomicRMW(
Expand Down
Loading
Loading