Skip to content

[WIP][AMDGPU] Fix emitting illegal COPY #131752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

PankajDwivedi-25
Copy link
Contributor

Shader kernel calling convention expects the return value to be in SGPR, instruction selection introduced COPY which is illegal/wrong here. Instead of a copy, it should insert readfirstlane?

bb.0 (%ir-block.0):
liveins: $sgpr0, $sgpr1
%1:sgpr_32 = COPY $sgpr1
%0:sgpr_32 = COPY $sgpr0
%3:sreg_32 = S_MOV_B32 2147483647
%5:vgpr_32 = COPY %0:sgpr_32
%6:vgpr_32 = COPY %1:sgpr_32
%4:vgpr_32 = V_BFI_B32_e64 killed %3:sreg_32, %5:vgpr_32, %6:vgpr_32, implicit $exec
$sgpr0 = COPY %4:vgpr_32
SI_RETURN_TO_EPILOG $sgpr0

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-amdgpu

Author: Pankaj Dwivedi (PankajDwivedi-25)

Changes

Shader kernel calling convention expects the return value to be in SGPR, instruction selection introduced COPY which is illegal/wrong here. Instead of a copy, it should insert readfirstlane?

> bb.0 (%ir-block.0):
> liveins: $sgpr0, $sgpr1
> %1:sgpr_32 = COPY $sgpr1
> %0:sgpr_32 = COPY $sgpr0
> %3:sreg_32 = S_MOV_B32 2147483647
> %5:vgpr_32 = COPY %0:sgpr_32
> %6:vgpr_32 = COPY %1:sgpr_32
> %4:vgpr_32 = V_BFI_B32_e64 killed %3:sreg_32, %5:vgpr_32, %6:vgpr_32, implicit $exec
> $sgpr0 = COPY %4:vgpr_32
> SI_RETURN_TO_EPILOG $sgpr0


Full diff: https://github.com/llvm/llvm-project/pull/131752.diff

1 Files Affected:

  • (added) llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll (+25)
diff --git a/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
new file mode 100644
index 0000000000000..6b14a660f580d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
@@ -0,0 +1,25 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; XFAIL: *
+; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs | FileCheck %s -check-prefixes=GCN
+
+define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+  %sign = fpext bfloat %sign.bf16 to float
+  %op = call float @llvm.copysign.f32(float %mag, float %sign)
+  %cast = bitcast float %op to i32
+  ret i32 %cast
+}
+
+; define i32 @s_copysign_f32_bf16(float %mag, bfloat %sign.bf16) {
+;   %sign = fpext bfloat %sign.bf16 to float
+;   %op = call float @llvm.copysign.f32(float %mag, float %sign)
+;   %cast = bitcast float %op to i32
+;   ret i32 %cast
+; }
+
+; define i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+;   %sign = fpext bfloat %sign.bf16 to float
+;   %op = call float @llvm.copysign.f32(float %mag, float %sign)
+;   %cast = bitcast float %op to i32
+;   %readlane = call i32 @llvm.amdgcn.readfirstlane(i32 %cast)
+;   ret i32 %readlane
+; }
\ No newline at end of file

@PankajDwivedi-25 PankajDwivedi-25 added the llvm:SelectionDAG SelectionDAGISel as well label Mar 18, 2025
@@ -899,6 +899,55 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI,
MI, MI.getDebugLoc())) {
I = std::next(I);
MI.eraseFromParent();
} else {
// At this point, if we still have a VGPR → SGPR copy, it is completely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good idea to use in the comment. -> is preferred.

@@ -899,6 +899,55 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI,
MI, MI.getDebugLoc())) {
I = std::next(I);
MI.eraseFromParent();
} else {
// At this point, if we still have a VGPR → SGPR copy, it is completely
// illegal. We assume that it was intentionally introduced and should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume? For example, #130443 can also introduce VGPR to SGPR copy but that doesn't seem to be the case here.

This fix can only fix one case IIUC so the comment needs to be adapted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with no. I don't want to make semantic guesses in this pass, and require inserting these up front where required. We're already suffering from the assumed usage of SCC in this pass

Copy link
Contributor Author

@PankajDwivedi-25 PankajDwivedi-25 Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with no. I don't want to make semantic guesses in this pass, and require inserting these up front where required. We're already suffering from the assumed usage of SCC in this pass

Agree with you.
This COPY itself is completely wrong at this point, Typically, if a value needs to be moved from a VGPR to an SGPR, instruction selection (ISel) should emit readfirstlane instead of a direct COPY.

It arises at many places after enabling the AMDGPUUniformIntrinsicCombine pass into llc pipeline PR: #128687, which optimizes certain intrinsic by propagating uniform values directly to their uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume? For example, #130443 can also introduce VGPR to SGPR copy but that doesn't seem to be the case here

I think it is similar case here as llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll?
Is it wrong to replace those COPY emitted with readfirstlane here in llvm/test/CodeGen/AMDGPU/issue130443.ll?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants