-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[SelectionDAG] Improve value type selection for inline asm within selected register class #135097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SelectionDAG] Improve value type selection for inline asm within selected register class #135097
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-nvptx @llvm/pr-subscribers-llvm-selectiondag Author: Da Li (李达) (dlee992) ChangesSummaryThis patch improves value type selection for inline assembly when a specific register class is involved. It avoids generating imprecise load/store instructions and unnecessary type extensions or truncations. Previously, the first legal value type was always selected, but this patch chooses a more accurate type when possible. Example (NVPTX)In the NVPTX target, a new def Int16Regs : NVPTXRegClass<[i16, f16, bf16], 16, (add (sequence "RS%u", 0, 4))>; ImpactThis is a target-independent improvement and may benefit other targets in similar scenarios. Full diff: https://github.com/llvm/llvm-project/pull/135097.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 89793c30f3710..528c8bd648332 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -9738,8 +9738,9 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
// register class, find it.
unsigned AssignedReg;
const TargetRegisterClass *RC;
+ const MVT RefValueVT = RefOpInfo.ConstraintVT;
std::tie(AssignedReg, RC) = TLI.getRegForInlineAsmConstraint(
- &TRI, RefOpInfo.ConstraintCode, RefOpInfo.ConstraintVT);
+ &TRI, RefOpInfo.ConstraintCode, RefValueVT);
// RC is unset only on failure. Return immediately.
if (!RC)
return std::nullopt;
@@ -9747,7 +9748,17 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
// Get the actual register value type. This is important, because the user
// may have asked for (e.g.) the AX register in i32 type. We need to
// remember that AX is actually i16 to get the right extension.
- const MVT RegVT = *TRI.legalclasstypes_begin(*RC);
+ MVT RegVT = *TRI.legalclasstypes_begin(*RC);
+
+ // If the reference value type is legal and belongs to the register class,
+ // use it instead of the first legal value type. This avoids generating
+ // inaccurate load/store instructions or unnecessary type extensions and
+ // truncations.
+ if (TLI.isTypeLegal(RefValueVT) &&
+ llvm::is_contained(llvm::make_range(TRI.legalclasstypes_begin(*RC),
+ TRI.legalclasstypes_end(*RC)),
+ RefValueVT.SimpleTy))
+ RegVT = RefValueVT.SimpleTy;
if (OpInfo.ConstraintVT != MVT::Other && RegVT != MVT::Untyped) {
// If this is an FP operand in an integer register (or visa versa), or more
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 9be54a746cacd..4db68f28b3c0a 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -81,6 +81,23 @@ define bfloat @test_fadd(bfloat %0, bfloat %1) {
ret bfloat %3
}
+define bfloat @test_fadd_inlineasm(bfloat %0, bfloat %1) nounwind {
+; SM90-LABEL: test_fadd_inlineasm(
+; SM90: {
+; SM90-NEXT: .reg .b16 %rs<4>;
+; SM90-EMPTY:
+; SM90-NEXT: // %bb.0:
+; SM90-NEXT: ld.param.b16 %rs2, [test_fadd_inlineasm_param_0];
+; SM90-NEXT: ld.param.b16 %rs3, [test_fadd_inlineasm_param_1];
+; SM90-NEXT: // begin inline asm
+; SM90-NEXT: add.rn.bf16 %rs1, %rs2, %rs3
+; SM90-NEXT: // end inline asm
+; SM90-NEXT: st.param.b16 [func_retval0], %rs1;
+; SM90-NEXT: ret;
+ %3 = tail call bfloat asm "add.rn.bf16 $0, $1, $2", "=h,h,h"(bfloat %0, bfloat %1) nounwind
+ ret bfloat %3
+}
+
define bfloat @test_fsub(bfloat %0, bfloat %1) {
; SM70-LABEL: test_fsub(
; SM70: {
|
llvm::is_contained(llvm::make_range(TRI.legalclasstypes_begin(*RC), | ||
TRI.legalclasstypes_end(*RC)), | ||
RefValueVT.SimpleTy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reinventing RC->isTypeLegalForClass. It should also be redundant to check isTypeLegal. I'd assume this looks something like
if (!isTypeLegalForClass())
RegVT = *TRI.legalclasstypes_begin(*RC)
@@ -9738,16 +9738,27 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL, | |||
// register class, find it. | |||
unsigned AssignedReg; | |||
const TargetRegisterClass *RC; | |||
const MVT RefValueVT = RefOpInfo.ConstraintVT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep using ConstraintVT directly?
Unfortunately, my PR breaks dozens of lit tests across 5 targets: I looked into one of the failures: # %bb.0:
fmv.d.x fa5, a0
fmv.x.d a0, fa5
#APP
mv a0, a0
#NO_APP
fmv.d.x fa5, a0
fmv.x.d a0, fa5
ret This happens because of RISCV64's handling of Other cases, like some in AMDGPU, require some instruction reordering. And there are even more issues I haven't fully investigated yet. My current thinking: this original behavior has been around for a long time, and public target maintainers either:
So, I think this work might need to be put on hold for now. What do you think? @arsenm . BTW, thanks for your valueable comments! |
Most codegen changes require yak shaving in other targets. There's no "hold for now", you push through them or this change will never happen. |
Summary
This patch improves value type selection for inline assembly when a specific register class is involved. It avoids generating imprecise load/store instructions and unnecessary type extensions or truncations. Previously, the first legal value type was always selected, but this patch chooses a more accurate type when possible.
Example (NVPTX)
In the NVPTX target, a new
bf16
addition test previously generatedld.param.u16
instead of the expectedld.param.b16
, due to always selecting the first value type in the list:Impact
This is a target-independent improvement and may benefit other targets in similar scenarios.