Skip to content

insert readfirstlane in the function returns in sgpr #135326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

PankajDwivedi-25
Copy link
Contributor

@PankajDwivedi-25 PankajDwivedi-25 commented Apr 11, 2025

insert readfirstlane in the function returns in sgpr.

@llvmbot
Copy link
Member

llvmbot commented Apr 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

Isel introduces direct copy from intermediate result to out reg which leads to illegal VReg to SReg copy, Introducing readfirstlane can avoid such cases?


Full diff: https://github.com/llvm/llvm-project/pull/135326.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+4-1)
  • (added) llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll (+21)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 250963b3019a0..f9384c80fa654 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3246,6 +3246,8 @@ SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
   SmallVector<SDValue, 48> RetOps;
   RetOps.push_back(Chain); // Operand #0 = Chain (updated below)
 
+  SDValue ReadFirstLane =
+      DAG.getTargetConstant(Intrinsic::amdgcn_readfirstlane, DL, MVT::i32);
   // Copy the result values into the output registers.
   for (unsigned I = 0, RealRVLocIdx = 0, E = RVLocs.size(); I != E;
        ++I, ++RealRVLocIdx) {
@@ -3273,7 +3275,8 @@ SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
     default:
       llvm_unreachable("Unknown loc info!");
     }
-
+    Arg = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Arg.getValueType(),
+                      ReadFirstLane, Arg);
     Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Glue);
     Glue = Chain.getValue(1);
     RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
diff --git a/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
new file mode 100644
index 0000000000000..34b5a3f358225
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs -stop-after=finalize-isel | FileCheck %s -check-prefixes=GFX11
+
+define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+  ; GFX11-LABEL: name: s_copysign_f32_bf16
+  ; GFX11: bb.0 (%ir-block.0):
+  ; GFX11-NEXT:   liveins: $sgpr0, $sgpr1
+  ; GFX11-NEXT: {{  $}}
+  ; GFX11-NEXT:   [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr1
+  ; GFX11-NEXT:   [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
+  ; GFX11-NEXT:   [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 16, [[COPY]], implicit $exec
+  ; GFX11-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 2147483647
+  ; GFX11-NEXT:   [[V_BFI_B32_e64_:%[0-9]+]]:vgpr_32 = V_BFI_B32_e64 killed [[S_MOV_B32_]], [[COPY1]], killed [[V_LSHLREV_B32_e64_]], implicit $exec
+  ; GFX11-NEXT:   [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 killed [[V_BFI_B32_e64_]], implicit $exec
+  ; GFX11-NEXT:   $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
+  ; GFX11-NEXT:   SI_RETURN_TO_EPILOG $sgpr0
+  %sign = fpext bfloat %sign.bf16 to float
+  %op = call float @llvm.copysign.f32(float %mag, float %sign)
+  %cast = bitcast float %op to i32
+  ret i32 %cast
+}

@jayfoad
Copy link
Contributor

jayfoad commented Apr 11, 2025

I am not too familiar with calling convention lowering, but for "normal" instructions this kind of thing is handled in SIInstrInfo::legalizeOperands.

@PankajDwivedi-25
Copy link
Contributor Author

I am not too familiar with calling convention lowering, but for "normal" instructions this kind of thing is handled in SIInstrInfo::legalizeOperands.

Actual problem is #131752

@arsenm
Copy link
Contributor

arsenm commented Apr 11, 2025

I am not too familiar with calling convention lowering, but for "normal" instructions this kind of thing is handled in SIInstrInfo::legalizeOperands.

We should not be trying to recover semantic intent down there. The calling convention has to handle this, and gisel already does this

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 | FileCheck %s -check-prefixes=GFX11

define amdgpu_ps i32 @s_copysign_uniform(float inreg %x, float inreg %y) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copysign isn't relevant to the issue (actually in this instance, the fact we emitted a BFI is a separate bug that should be fixed). Test with the fpext for instance, that always has to be VALU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of fpxet, it returns in vgpr.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should be more specific, this is inserting readfirstlane on function returning in sgprs

@PankajDwivedi-25 PankajDwivedi-25 force-pushed the users/pkd-25/introduce-readfirstlane-during-return-lowering branch from 4a991f3 to ec5c6be Compare April 15, 2025 08:46
@PankajDwivedi-25
Copy link
Contributor Author

@arsenm waiting for your feedback.

@arsenm
Copy link
Contributor

arsenm commented Apr 17, 2025

@arsenm waiting for your feedback.

Title isn't fixed

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 | FileCheck %s -check-prefixes=GFX11

define amdgpu_ps float @uniform_fpext(half inreg %x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test filename isn't right, this isn't about fixing illegal copies. The function test names also aren't really right, this is about returning uniform VGPRs in SGPRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is too specific and does not solve all the illegal V to S copies.

also, so far I explored DAG->DAG Isel. This is just one place where we could emit an illegal copy. can we have something generic to solve this issue? For now, copyToReg is handled in the generic target opcode as COPY.

Do you have any suggestions on how it can be handled if not inside SIFixSGPRCopies?

@arsenm
Copy link
Contributor

arsenm commented Apr 18, 2025

Patch title is still wrong

@PankajDwivedi-25 PankajDwivedi-25 changed the title stop emitting direct copy from intermediate result to out reg insert readfirstlane in the function returns in sgpr Apr 18, 2025
@PankajDwivedi-25
Copy link
Contributor Author

@arsenm, @cdevadas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants