-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[WIP][AMDGPU] Fix emitting illegal COPY #131752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP][AMDGPU] Fix emitting illegal COPY #131752
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-amdgpu Author: Pankaj Dwivedi (PankajDwivedi-25) ChangesShader kernel calling convention expects the return value to be in SGPR, instruction selection introduced > bb.0 (%ir-block.0): Full diff: https://github.com/llvm/llvm-project/pull/131752.diff 1 Files Affected:
diff --git a/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
new file mode 100644
index 0000000000000..6b14a660f580d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll
@@ -0,0 +1,25 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; XFAIL: *
+; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs | FileCheck %s -check-prefixes=GCN
+
+define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+ %sign = fpext bfloat %sign.bf16 to float
+ %op = call float @llvm.copysign.f32(float %mag, float %sign)
+ %cast = bitcast float %op to i32
+ ret i32 %cast
+}
+
+; define i32 @s_copysign_f32_bf16(float %mag, bfloat %sign.bf16) {
+; %sign = fpext bfloat %sign.bf16 to float
+; %op = call float @llvm.copysign.f32(float %mag, float %sign)
+; %cast = bitcast float %op to i32
+; ret i32 %cast
+; }
+
+; define i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
+; %sign = fpext bfloat %sign.bf16 to float
+; %op = call float @llvm.copysign.f32(float %mag, float %sign)
+; %cast = bitcast float %op to i32
+; %readlane = call i32 @llvm.amdgcn.readfirstlane(i32 %cast)
+; ret i32 %readlane
+; }
\ No newline at end of file
|
@@ -899,6 +899,55 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI, | |||
MI, MI.getDebugLoc())) { | |||
I = std::next(I); | |||
MI.eraseFromParent(); | |||
} else { | |||
// At this point, if we still have a VGPR → SGPR copy, it is completely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea to use →
in the comment. ->
is preferred.
@@ -899,6 +899,55 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI, | |||
MI, MI.getDebugLoc())) { | |||
I = std::next(I); | |||
MI.eraseFromParent(); | |||
} else { | |||
// At this point, if we still have a VGPR → SGPR copy, it is completely | |||
// illegal. We assume that it was intentionally introduced and should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume? For example, #130443 can also introduce VGPR to SGPR copy but that doesn't seem to be the case here.
This fix can only fix one case IIUC so the comment needs to be adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with no. I don't want to make semantic guesses in this pass, and require inserting these up front where required. We're already suffering from the assumed usage of SCC in this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with no. I don't want to make semantic guesses in this pass, and require inserting these up front where required. We're already suffering from the assumed usage of SCC in this pass
Agree with you.
This COPY itself is completely wrong at this point, Typically, if a value needs to be moved from a VGPR to an SGPR, instruction selection (ISel) should emit readfirstlane instead of a direct COPY.
It arises at many places after enabling the AMDGPUUniformIntrinsicCombine pass into llc pipeline PR: #128687, which optimizes certain intrinsic by propagating uniform values directly to their uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume? For example, #130443 can also introduce VGPR to SGPR copy but that doesn't seem to be the case here
I think it is similar case here as llvm/test/CodeGen/AMDGPU/fix-illegal-copy.ll?
Is it wrong to replace those COPY emitted with readfirstlane here in llvm/test/CodeGen/AMDGPU/issue130443.ll?
Shader kernel calling convention expects the return value to be in SGPR, instruction selection introduced
COPY
which is illegal/wrong here. Instead of a copy, it should insertreadfirstlane
?