Skip to content

[LoongArch] Try to widen shuffle mask #136081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2025
Merged

Conversation

tangaac
Copy link
Contributor

@tangaac tangaac commented Apr 17, 2025

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Apr 17, 2025

@llvm/pr-subscribers-backend-loongarch

Author: None (tangaac)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/136081.diff

1 Files Affected:

  • (modified) llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp (+37)
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index f72b55e1d175c..913f96195aa59 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -31,6 +31,10 @@
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/KnownBits.h"
 #include "llvm/Support/MathExtras.h"
+#include <llvm/ADT/ArrayRef.h>
+#include <llvm/ADT/SmallVector.h>
+#include <llvm/Analysis/VectorUtils.h>
+#include <llvm/CodeGenTypes/MachineValueType.h>
 
 using namespace llvm;
 
@@ -1671,6 +1675,36 @@ static SDValue lower256BitShuffle(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
 
   return SDValue();
 }
+// Widen element type to get a new mask value (if possible).
+// For example:
+//  shufflevector <4 x i32> %a, <4 x i32> %b,
+//                <4 x i32> <i32 6, i32 7, i32 2, i32 3>
+// is equivalent to:
+//  shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
+// can be lowered to:
+//  VPACKOD_D vr0, vr0, vr1
+static SDValue widenShuffleMask(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
+                                SDValue V1, SDValue V2, SelectionDAG &DAG) {
+  unsigned EltBits = VT.getScalarSizeInBits();
+
+  if (EltBits > 32 || EltBits == 1)
+    return SDValue();
+
+  SmallVector<int, 8> NewMask;
+  if (widenShuffleMaskElts(Mask, NewMask)) {
+    MVT NewEltVT = VT.isFloatingPoint() ? MVT::getFloatingPointVT(EltBits * 2)
+                                        : MVT::getIntegerVT(EltBits * 2);
+    MVT NewVT = MVT::getVectorVT(NewEltVT, VT.getVectorNumElements() / 2);
+    if (DAG.getTargetLoweringInfo().isTypeLegal(NewVT)) {
+      SDValue NewV1 = DAG.getBitcast(NewVT, V1);
+      SDValue NewV2 = DAG.getBitcast(NewVT, V2);
+      return DAG.getBitcast(
+          VT, DAG.getVectorShuffle(NewVT, DL, NewV1, NewV2, NewMask));
+    }
+  }
+
+  return SDValue();
+}
 
 SDValue LoongArchTargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
                                                      SelectionDAG &DAG) const {
@@ -1705,6 +1739,9 @@ SDValue LoongArchTargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
     return DAG.getVectorShuffle(VT, DL, V1, V2, NewMask);
   }
 
+  if (SDValue NewShuffle = widenShuffleMask(DL, OrigMask, VT, V1, V2, DAG))
+    return NewShuffle;
+
   // Check for illegal shuffle mask element index values.
   int MaskUpperLimit = OrigMask.size() * (V2IsUndef ? 1 : 2);
   (void)MaskUpperLimit;

@tangaac tangaac force-pushed the widen-shuffle-mask branch 3 times, most recently from 331fbc3 to 58431c5 Compare April 17, 2025 07:27
@tangaac
Copy link
Contributor Author

tangaac commented Apr 17, 2025

Here are the files optimized by this pr on llvm-test-suite:
lsx tangaac/loong-opt-cov-ts@7b237f8
lasx tangaac/loong-opt-cov-ts@4c7a15c

@@ -6,7 +6,8 @@ define <32 x i8> @widen_shuffle_mask_v32i8_to_v16i16(<32 x i8> %a, <32 x i8> %b)
; CHECK: # %bb.0:
; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0)
; CHECK-NEXT: xvld $xr2, $a0, %pc_lo12(.LCPI0_0)
; CHECK-NEXT: xvshuf.b $xr0, $xr1, $xr0, $xr2
; CHECK-NEXT: xvshuf.h $xr2, $xr1, $xr0
; CHECK-NEXT: xvori.b $xr0, $xr2, 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the xvori.b instruction redundant in this context? Since it effectively acts as a no-op copy from $xr2 to $xr0, it might be possible to eliminate it entirely by adjusting the destination of the preceding xvshuf.h instruction. If that's safe to do here, consider folding this into the current patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the xvori.b instruction redundant in this context? Since it effectively acts as a no-op copy from $xr2 to $xr0, it might be possible to eliminate it entirely by adjusting the destination of the preceding xvshuf.h instruction. If that's safe to do here, consider folding this into the current patch.

This test uses vector register to pass parameters, so xvori.b is necessary.
The loss of xvori.b is acceptable, we widen shuffle mask before lower it to vshuf, so it still has chance to be lowerd to other shuffle instructions.

v32i8 -> v16i16 ---- can't be widenen anymore ----> xvshuf.h 
                ---- can be widened to v8i32 ----> v8i32 -> xvpackev.w

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just to clarify - are you assuming $xr0 is used as a return value here? If so, would the following be equivalent?

; CHECK-NEXT:    pcalau12i $a0, %pc_hi20(.LCPI0_0)
; CHECK-NEXT:    xvld $xr2, $a0, %pc_lo12(.LCPI0_0)
; CHECK-NEXT:    xvshuf.h $xr0, $xr1, $xr0

Under the native ABI, the 256-bit vector return values typically go through the stack, not $xr registers. If we're confident xvori.b isn't needed, it might be worth double-checking how this behaves with native ABI return conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, xvori.b is indeed redundant.
May we should use pointer instead vector in these lasx ir tests.

define void @widen_shuffle_mask_v32i8_to_v16i16(ptr %a, ptr %b, ptr %c) {
; CHECK-LABEL: widen_shuffle_mask_v32i8_to_v16i16:
; CHECK:       # %bb.0:
; CHECK-NEXT:    xvld	$xr1, $a0, 0
; CHECK-NEXT:    xvld	$xr2, $a1, 0
; CHECK-NEXT:    pcalau12i	$a0, %pc_hi20(.LCPI0_0)
; CHECK-NEXT:    xvld $xr0, $a0, %pc_lo12(.LCPI0_0)
; CHECK-NEXT:    xvshuf.h $xr0, $xr2, $xr1
; CHECK-NEXT:    ret
    %1 = load <32 x i8>, ptr %a
    %2 = load <32 x i8>, ptr %b
    %r = shufflevector <32 x i8> %1, <32 x i8> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 32, i32 33, i32 34, i32 35, i32 24, i32 25, i32 26, i32 27, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 60, i32 61, i32 30, i32 31>
    store <32 x i8> %r, ptr %c
    ret void
}

Copy link
Member

@heiher heiher Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see -- since the xd operand in xvshuf.h serves as both input and output, the use of xvori.b is indeed necessary. Looks good to me.

@tangaac tangaac merged commit 34845ac into llvm:main Apr 28, 2025
11 checks passed
@tangaac tangaac deleted the widen-shuffle-mask branch April 28, 2025 06:26
jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants