-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[X86] Merge insertsubvector(load(p0),load_subv(p0),hi) -> subvbroadcast(p0) if either load is oneuse #128857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Merge insertsubvector(load(p0),load_subv(p0),hi) -> subvbroadcast(p0) if either load is oneuse #128857
Conversation
…st(p0) if either load is oneuse This fold is currently limited to cases where the load_subv(p0) has oneuse, but its beneficial if either load has oneuse and will be replaced. Yes another yak shave for llvm#122671
@llvm/pr-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon) ChangesThis fold is currently limited to cases where the load_subv(p0) has oneuse, but its beneficial if either load has oneuse and will be replaced. Yes another yak shave for #122671 Full diff: https://github.com/llvm/llvm-project/pull/128857.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 84aaf86550842..2791021ee0f10 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -58562,8 +58562,9 @@ static SDValue combineINSERT_SUBVECTOR(SDNode *N, SelectionDAG &DAG,
// If we're splatting the lower half subvector of a full vector load into the
// upper half, attempt to create a subvector broadcast.
- if (IdxVal == (OpVT.getVectorNumElements() / 2) && SubVec.hasOneUse() &&
- Vec.getValueSizeInBits() == (2 * SubVec.getValueSizeInBits())) {
+ if (IdxVal == (OpVT.getVectorNumElements() / 2) &&
+ Vec.getValueSizeInBits() == (2 * SubVec.getValueSizeInBits()) &&
+ (Vec.hasOneUse() || SubVec.hasOneUse())) {
auto *VecLd = dyn_cast<LoadSDNode>(Vec);
auto *SubLd = dyn_cast<LoadSDNode>(SubVec);
if (VecLd && SubLd &&
diff --git a/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll b/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
index c1dba071b4353..b6cbddd667d8e 100644
--- a/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
+++ b/llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
@@ -10210,11 +10210,9 @@ define void @store_i8_stride7_vf64(ptr %in.vecptr0, ptr %in.vecptr1, ptr %in.vec
; AVX512BW-NEXT: vpshufb %ymm28, %ymm18, %ymm23
; AVX512BW-NEXT: vporq %ymm2, %ymm23, %ymm2
; AVX512BW-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
-; AVX512BW-NEXT: vmovdqa64 (%rdx), %zmm2
-; AVX512BW-NEXT: vinserti64x4 $1, %ymm14, %zmm2, %zmm2
+; AVX512BW-NEXT: vbroadcasti64x4 {{.*#+}} zmm2 = mem[0,1,2,3,0,1,2,3]
; AVX512BW-NEXT: vpshufb %zmm20, %zmm2, %zmm2
-; AVX512BW-NEXT: vmovdqa64 (%rcx), %zmm20
-; AVX512BW-NEXT: vinserti64x4 $1, %ymm15, %zmm20, %zmm20
+; AVX512BW-NEXT: vbroadcasti64x4 {{.*#+}} zmm20 = mem[0,1,2,3,0,1,2,3]
; AVX512BW-NEXT: vpshufb %zmm22, %zmm20, %zmm20
; AVX512BW-NEXT: vporq %zmm2, %zmm20, %zmm2
; AVX512BW-NEXT: vpermq {{.*#+}} zmm1 = zmm1[2,3,2,3,6,7,6,7]
@@ -10816,11 +10814,9 @@ define void @store_i8_stride7_vf64(ptr %in.vecptr0, ptr %in.vecptr1, ptr %in.vec
; AVX512DQ-BW-NEXT: vpshufb %ymm28, %ymm18, %ymm23
; AVX512DQ-BW-NEXT: vporq %ymm2, %ymm23, %ymm2
; AVX512DQ-BW-NEXT: vinserti64x4 $1, %ymm2, %zmm1, %zmm1
-; AVX512DQ-BW-NEXT: vmovdqa64 (%rdx), %zmm2
-; AVX512DQ-BW-NEXT: vinserti64x4 $1, %ymm14, %zmm2, %zmm2
+; AVX512DQ-BW-NEXT: vbroadcasti64x4 {{.*#+}} zmm2 = mem[0,1,2,3,0,1,2,3]
; AVX512DQ-BW-NEXT: vpshufb %zmm20, %zmm2, %zmm2
-; AVX512DQ-BW-NEXT: vmovdqa64 (%rcx), %zmm20
-; AVX512DQ-BW-NEXT: vinserti64x4 $1, %ymm15, %zmm20, %zmm20
+; AVX512DQ-BW-NEXT: vbroadcasti64x4 {{.*#+}} zmm20 = mem[0,1,2,3,0,1,2,3]
; AVX512DQ-BW-NEXT: vpshufb %zmm22, %zmm20, %zmm20
; AVX512DQ-BW-NEXT: vporq %zmm2, %zmm20, %zmm2
; AVX512DQ-BW-NEXT: vpermq {{.*#+}} zmm1 = zmm1[2,3,2,3,6,7,6,7]
|
Vec.getValueSizeInBits() == (2 * SubVec.getValueSizeInBits())) { | ||
if (IdxVal == (OpVT.getVectorNumElements() / 2) && | ||
Vec.getValueSizeInBits() == (2 * SubVec.getValueSizeInBits()) && | ||
(Vec.hasOneUse() || SubVec.hasOneUse())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Vec
have oneuse matter? Can't the loaded value be used more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I can get this to always replace uses of subvec - then we can get rid of the oneuse checks entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oneuse checks are still preventing some regressions due to shouldReduceLoadWidth being so permissive, but I've at least ensured that any other uses of SubVec are replaced now.
SDValue NewSubVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, SubVecVT, | ||
BcastLd, DAG.getVectorIdxConstant(0, dl)); | ||
DCI.CombineTo(SubLd, NewSubVec, BcastLd.getValue(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only required when SubVec
has multiple user. Do we need to add a condition check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need the check, oneuse cases will be handled fine, but it might slightly improve perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its easier if we always call it, just in case there's weird cases like only Subvec's chain is still in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…st(p0) if either load is oneuse (llvm#128857) This fold is currently limited to cases where the load_subv(p0) has oneuse, but its beneficial if either load has oneuse and will be replaced. Yet another yak shave for llvm#122671
This fold is currently limited to cases where the load_subv(p0) has oneuse, but its beneficial if either load has oneuse and will be replaced.
Yet another yak shave for #122671