Skip to content

[SelectionDAG] Improve value type selection for inline asm within selected register class #135097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dlee992
Copy link

@dlee992 dlee992 commented Apr 9, 2025

Summary

This patch improves value type selection for inline assembly when a specific register class is involved. It avoids generating imprecise load/store instructions and unnecessary type extensions or truncations. Previously, the first legal value type was always selected, but this patch chooses a more accurate type when possible.

Example (NVPTX)

In the NVPTX target, a new bf16 addition test previously generated ld.param.u16 instead of the expected ld.param.b16, due to always selecting the first value type in the list:

def Int16Regs : NVPTXRegClass<[i16, f16, bf16], 16, (add (sequence "RS%u", 0, 4))>;

Impact

This is a target-independent improvement and may benefit other targets in similar scenarios.

Copy link

github-actions bot commented Apr 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Apr 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2025

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-llvm-selectiondag

Author: Da Li (李达) (dlee992)

Changes

Summary

This patch improves value type selection for inline assembly when a specific register class is involved. It avoids generating imprecise load/store instructions and unnecessary type extensions or truncations. Previously, the first legal value type was always selected, but this patch chooses a more accurate type when possible.

Example (NVPTX)

In the NVPTX target, a new bf16 addition test previously generated ld.param.u16 instead of the expected ld.param.b16, due to always selecting the first value type in the list:

def Int16Regs : NVPTXRegClass&lt;[i16, f16, bf16], 16, (add (sequence "RS%u", 0, 4))&gt;;

Impact

This is a target-independent improvement and may benefit other targets in similar scenarios.


Full diff: https://github.com/llvm/llvm-project/pull/135097.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+13-2)
  • (modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+17)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 89793c30f3710..528c8bd648332 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -9738,8 +9738,9 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
   // register class, find it.
   unsigned AssignedReg;
   const TargetRegisterClass *RC;
+  const MVT RefValueVT = RefOpInfo.ConstraintVT;
   std::tie(AssignedReg, RC) = TLI.getRegForInlineAsmConstraint(
-      &TRI, RefOpInfo.ConstraintCode, RefOpInfo.ConstraintVT);
+      &TRI, RefOpInfo.ConstraintCode, RefValueVT);
   // RC is unset only on failure. Return immediately.
   if (!RC)
     return std::nullopt;
@@ -9747,7 +9748,17 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
   // Get the actual register value type.  This is important, because the user
   // may have asked for (e.g.) the AX register in i32 type.  We need to
   // remember that AX is actually i16 to get the right extension.
-  const MVT RegVT = *TRI.legalclasstypes_begin(*RC);
+  MVT RegVT = *TRI.legalclasstypes_begin(*RC);
+
+  // If the reference value type is legal and belongs to the register class,
+  // use it instead of the first legal value type. This avoids generating
+  // inaccurate load/store instructions or unnecessary type extensions and
+  // truncations.
+  if (TLI.isTypeLegal(RefValueVT) &&
+      llvm::is_contained(llvm::make_range(TRI.legalclasstypes_begin(*RC),
+                                          TRI.legalclasstypes_end(*RC)),
+                         RefValueVT.SimpleTy))
+    RegVT = RefValueVT.SimpleTy;
 
   if (OpInfo.ConstraintVT != MVT::Other && RegVT != MVT::Untyped) {
     // If this is an FP operand in an integer register (or visa versa), or more
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index 9be54a746cacd..4db68f28b3c0a 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -81,6 +81,23 @@ define bfloat @test_fadd(bfloat %0, bfloat %1) {
   ret bfloat %3
 }
 
+define bfloat @test_fadd_inlineasm(bfloat %0, bfloat %1) nounwind {
+; SM90-LABEL: test_fadd_inlineasm(
+; SM90:       {
+; SM90-NEXT:    .reg .b16 %rs<4>;
+; SM90-EMPTY:
+; SM90-NEXT:  // %bb.0:
+; SM90-NEXT:    ld.param.b16 %rs2, [test_fadd_inlineasm_param_0];
+; SM90-NEXT:    ld.param.b16 %rs3, [test_fadd_inlineasm_param_1];
+; SM90-NEXT:    // begin inline asm
+; SM90-NEXT:    add.rn.bf16 %rs1, %rs2, %rs3
+; SM90-NEXT:    // end inline asm
+; SM90-NEXT:    st.param.b16 [func_retval0], %rs1;
+; SM90-NEXT:    ret;
+  %3 = tail call bfloat asm "add.rn.bf16 $0, $1, $2", "=h,h,h"(bfloat %0, bfloat %1) nounwind
+  ret bfloat %3
+}
+
 define bfloat @test_fsub(bfloat %0, bfloat %1) {
 ; SM70-LABEL: test_fsub(
 ; SM70:       {

Comment on lines +9758 to +9760
llvm::is_contained(llvm::make_range(TRI.legalclasstypes_begin(*RC),
TRI.legalclasstypes_end(*RC)),
RefValueVT.SimpleTy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reinventing RC->isTypeLegalForClass. It should also be redundant to check isTypeLegal. I'd assume this looks something like

if (!isTypeLegalForClass()) 
  RegVT = *TRI.legalclasstypes_begin(*RC)

@@ -9738,16 +9738,27 @@ getRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
// register class, find it.
unsigned AssignedReg;
const TargetRegisterClass *RC;
const MVT RefValueVT = RefOpInfo.ConstraintVT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep using ConstraintVT directly?

@dlee992
Copy link
Author

dlee992 commented Apr 10, 2025

Unfortunately, my PR breaks dozens of lit tests across 5 targets:
https://buildkite.com/llvm-project/github-pull-requests/builds/167049

I looked into one of the failures: LLVM.CodeGen/RISCV/inline-asm-d-constraint-f.ll, specifically the constraint_gpr function. My PR causes it to emit unnecessary fmv (bit-wise float-int move) instructions:

# %bb.0:
    fmv.d.x fa5, a0
    fmv.x.d a0, fa5
    #APP
    mv      a0, a0
    #NO_APP
    fmv.d.x fa5, a0
    fmv.x.d a0, fa5
    ret

This happens because of RISCV64's handling of ZDINX on RV64 — it intentionally allows f64 in GPRs and uses the first value type in the VTList for the register class.

Other cases, like some in AMDGPU, require some instruction reordering. And there are even more issues I haven't fully investigated yet.

My current thinking: this original behavior has been around for a long time, and public target maintainers either:

  • explicitly worked around it by tuning their RegisterInfo definitions and backend logic, or
  • simply haven’t encountered it yet due to a lack of inline asm test coverage.

So, I think this work might need to be put on hold for now. What do you think? @arsenm . BTW, thanks for your valueable comments!

@arsenm
Copy link
Contributor

arsenm commented Apr 10, 2025

So, I think this work might need to be put on hold for now. What do you think? @arsenm . BTW, thanks for your valueable comments!

Most codegen changes require yak shaving in other targets. There's no "hold for now", you push through them or this change will never happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants