-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LoongArch] Try to widen shuffle mask #136081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-loongarch Author: None (tangaac) ChangesFull diff: https://github.com/llvm/llvm-project/pull/136081.diff 1 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index f72b55e1d175c..913f96195aa59 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -31,6 +31,10 @@
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"
#include "llvm/Support/MathExtras.h"
+#include <llvm/ADT/ArrayRef.h>
+#include <llvm/ADT/SmallVector.h>
+#include <llvm/Analysis/VectorUtils.h>
+#include <llvm/CodeGenTypes/MachineValueType.h>
using namespace llvm;
@@ -1671,6 +1675,36 @@ static SDValue lower256BitShuffle(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
return SDValue();
}
+// Widen element type to get a new mask value (if possible).
+// For example:
+// shufflevector <4 x i32> %a, <4 x i32> %b,
+// <4 x i32> <i32 6, i32 7, i32 2, i32 3>
+// is equivalent to:
+// shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
+// can be lowered to:
+// VPACKOD_D vr0, vr0, vr1
+static SDValue widenShuffleMask(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
+ SDValue V1, SDValue V2, SelectionDAG &DAG) {
+ unsigned EltBits = VT.getScalarSizeInBits();
+
+ if (EltBits > 32 || EltBits == 1)
+ return SDValue();
+
+ SmallVector<int, 8> NewMask;
+ if (widenShuffleMaskElts(Mask, NewMask)) {
+ MVT NewEltVT = VT.isFloatingPoint() ? MVT::getFloatingPointVT(EltBits * 2)
+ : MVT::getIntegerVT(EltBits * 2);
+ MVT NewVT = MVT::getVectorVT(NewEltVT, VT.getVectorNumElements() / 2);
+ if (DAG.getTargetLoweringInfo().isTypeLegal(NewVT)) {
+ SDValue NewV1 = DAG.getBitcast(NewVT, V1);
+ SDValue NewV2 = DAG.getBitcast(NewVT, V2);
+ return DAG.getBitcast(
+ VT, DAG.getVectorShuffle(NewVT, DL, NewV1, NewV2, NewMask));
+ }
+ }
+
+ return SDValue();
+}
SDValue LoongArchTargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
SelectionDAG &DAG) const {
@@ -1705,6 +1739,9 @@ SDValue LoongArchTargetLowering::lowerVECTOR_SHUFFLE(SDValue Op,
return DAG.getVectorShuffle(VT, DL, V1, V2, NewMask);
}
+ if (SDValue NewShuffle = widenShuffleMask(DL, OrigMask, VT, V1, V2, DAG))
+ return NewShuffle;
+
// Check for illegal shuffle mask element index values.
int MaskUpperLimit = OrigMask.size() * (V2IsUndef ? 1 : 2);
(void)MaskUpperLimit;
|
331fbc3
to
58431c5
Compare
Here are the files optimized by this pr on llvm-test-suite: |
58431c5
to
18c72db
Compare
@@ -6,7 +6,8 @@ define <32 x i8> @widen_shuffle_mask_v32i8_to_v16i16(<32 x i8> %a, <32 x i8> %b) | |||
; CHECK: # %bb.0: | |||
; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0) | |||
; CHECK-NEXT: xvld $xr2, $a0, %pc_lo12(.LCPI0_0) | |||
; CHECK-NEXT: xvshuf.b $xr0, $xr1, $xr0, $xr2 | |||
; CHECK-NEXT: xvshuf.h $xr2, $xr1, $xr0 | |||
; CHECK-NEXT: xvori.b $xr0, $xr2, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the xvori.b
instruction redundant in this context? Since it effectively acts as a no-op copy from $xr2
to $xr0
, it might be possible to eliminate it entirely by adjusting the destination of the preceding xvshuf.h
instruction. If that's safe to do here, consider folding this into the current patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
xvori.b
instruction redundant in this context? Since it effectively acts as a no-op copy from$xr2
to$xr0
, it might be possible to eliminate it entirely by adjusting the destination of the precedingxvshuf.h
instruction. If that's safe to do here, consider folding this into the current patch.
This test uses vector register to pass parameters, so xvori.b
is necessary.
The loss of xvori.b
is acceptable, we widen shuffle mask before lower it to vshuf
, so it still has chance to be lowerd to other shuffle instructions.
v32i8 -> v16i16 ---- can't be widenen anymore ----> xvshuf.h
---- can be widened to v8i32 ----> v8i32 -> xvpackev.w
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just to clarify - are you assuming $xr0
is used as a return value here? If so, would the following be equivalent?
; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0)
; CHECK-NEXT: xvld $xr2, $a0, %pc_lo12(.LCPI0_0)
; CHECK-NEXT: xvshuf.h $xr0, $xr1, $xr0
Under the native ABI, the 256-bit vector return values typically go through the stack, not $xr
registers. If we're confident xvori.b
isn't needed, it might be worth double-checking how this behaves with native ABI return conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, xvori.b
is indeed redundant.
May we should use pointer instead vector in these lasx ir tests.
define void @widen_shuffle_mask_v32i8_to_v16i16(ptr %a, ptr %b, ptr %c) {
; CHECK-LABEL: widen_shuffle_mask_v32i8_to_v16i16:
; CHECK: # %bb.0:
; CHECK-NEXT: xvld $xr1, $a0, 0
; CHECK-NEXT: xvld $xr2, $a1, 0
; CHECK-NEXT: pcalau12i $a0, %pc_hi20(.LCPI0_0)
; CHECK-NEXT: xvld $xr0, $a0, %pc_lo12(.LCPI0_0)
; CHECK-NEXT: xvshuf.h $xr0, $xr2, $xr1
; CHECK-NEXT: ret
%1 = load <32 x i8>, ptr %a
%2 = load <32 x i8>, ptr %b
%r = shufflevector <32 x i8> %1, <32 x i8> %2, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 32, i32 33, i32 34, i32 35, i32 24, i32 25, i32 26, i32 27, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 60, i32 61, i32 30, i32 31>
store <32 x i8> %r, ptr %c
ret void
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see -- since the xd
operand in xvshuf.h
serves as both input and output, the use of xvori.b
is indeed necessary. Looks good to me.
No description provided.