-
Notifications
You must be signed in to change notification settings - Fork 13.3k
insert readfirstlane in the function returns in sgpr #135326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
insert readfirstlane in the function returns in sgpr #135326
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Pankaj Dwivedi (PankajDwivedi-25) ChangesIsel introduces direct copy from intermediate result to out reg which leads to illegal VReg to SReg copy, Introducing Full diff: https://github.com/llvm/llvm-project/pull/135326.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 250963b3019a0..f9384c80fa654 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3246,6 +3246,8 @@ SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
SmallVector<SDValue, 48> RetOps;
RetOps.push_back(Chain); // Operand #0 = Chain (updated below)
+ SDValue ReadFirstLane =
+ DAG.getTargetConstant(Intrinsic::amdgcn_readfirstlane, DL, MVT::i32);
// Copy the result values into the output registers.
for (unsigned I = 0, RealRVLocIdx = 0, E = RVLocs.size(); I != E;
++I, ++RealRVLocIdx) {
@@ -3273,7 +3275,8 @@ SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
default:
llvm_unreachable("Unknown loc info!");
}
-
+ Arg = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Arg.getValueType(),
+ ReadFirstLane, Arg);
Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Glue);
Glue = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
diff --git a/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
new file mode 100644
index 0000000000000..34b5a3f358225
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -stop-after=finalize-isel | FileCheck %s -check-prefixes=GFX11
+
+define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+ ; GFX11-LABEL: name: s_copysign_f32_bf16
+ ; GFX11: bb.0 (%ir-block.0):
+ ; GFX11-NEXT: liveins: $sgpr0, $sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr1
+ ; GFX11-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
+ ; GFX11-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 16, [[COPY]], implicit $exec
+ ; GFX11-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 2147483647
+ ; GFX11-NEXT: [[V_BFI_B32_e64_:%[0-9]+]]:vgpr_32 = V_BFI_B32_e64 killed [[S_MOV_B32_]], [[COPY1]], killed [[V_LSHLREV_B32_e64_]], implicit $exec
+ ; GFX11-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 killed [[V_BFI_B32_e64_]], implicit $exec
+ ; GFX11-NEXT: $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
+ ; GFX11-NEXT: SI_RETURN_TO_EPILOG $sgpr0
+ %sign = fpext bfloat %sign.bf16 to float
+ %op = call float @llvm.copysign.f32(float %mag, float %sign)
+ %cast = bitcast float %op to i32
+ ret i32 %cast
+}
|
I am not too familiar with calling convention lowering, but for "normal" instructions this kind of thing is handled in |
Actual problem is #131752 |
We should not be trying to recover semantic intent down there. The calling convention has to handle this, and gisel already does this |
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 | FileCheck %s -check-prefixes=GFX11 | ||
|
||
define amdgpu_ps i32 @s_copysign_uniform(float inreg %x, float inreg %y) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copysign isn't relevant to the issue (actually in this instance, the fact we emitted a BFI is a separate bug that should be fixed). Test with the fpext for instance, that always has to be VALU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of fpxet, it returns in vgpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description should be more specific, this is inserting readfirstlane on function returning in sgprs
4a991f3
to
ec5c6be
Compare
@arsenm waiting for your feedback. |
Title isn't fixed |
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 | FileCheck %s -check-prefixes=GFX11 | ||
|
||
define amdgpu_ps float @uniform_fpext(half inreg %x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test filename isn't right, this isn't about fixing illegal copies. The function test names also aren't really right, this is about returning uniform VGPRs in SGPRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is too specific and does not solve all the illegal V to S copies.
also, so far I explored DAG->DAG Isel. This is just one place where we could emit an illegal copy. can we have something generic to solve this issue? For now, copyToReg is handled in the generic target opcode as COPY.
Do you have any suggestions on how it can be handled if not inside SIFixSGPRCopies?
Patch title is still wrong |
insert
readfirstlane
in the function returns in sgpr.