-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking and NUW #130617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-transforms Author: Fabian Ritter (ritter-x2a) ChangesIf we know that the initial GEP was inbounds, and we change it to a For SWDEV-516125. Full diff: https://github.com/llvm/llvm-project/pull/130617.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index 138a71ce79cef..070afdf0752f4 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -1052,6 +1052,8 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
}
}
+ bool MayRecoverInbounds = AccumulativeByteOffset >= 0 && GEP->isInBounds();
+
// Remove the constant offset in each sequential index. The resultant GEP
// computes the variadic base.
// Notice that we don't remove struct field indices here. If LowerGEP is
@@ -1079,6 +1081,8 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// and the old index if they are not used.
RecursivelyDeleteTriviallyDeadInstructions(UserChainTail);
RecursivelyDeleteTriviallyDeadInstructions(OldIdx);
+ MayRecoverInbounds =
+ MayRecoverInbounds && computeKnownBits(NewIdx, *DL).isNonNegative();
}
}
}
@@ -1100,11 +1104,15 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// address with silently-wrapping two's complement arithmetic".
// Therefore, the final code will be a semantically equivalent.
//
- // TODO(jingyue): do some range analysis to keep as many inbounds as
- // possible. GEPs with inbounds are more friendly to alias analysis.
- // TODO(gep_nowrap): Preserve nuw at least.
- auto NewGEPFlags = GEPNoWrapFlags::none();
- GEP->setNoWrapFlags(GEPNoWrapFlags::none());
+ // If the initial GEP was inbounds and all variable indices and the
+ // accumulated offsets are non-negative, they can be added in any order and
+ // the intermediate results are in bounds. So, we can preserve the inbounds
+ // flag for both GEPs. GEPs with inbounds are more friendly to alias analysis.
+ //
+ // TODO(gep_nowrap): Preserve nuw?
+ auto NewGEPFlags =
+ MayRecoverInbounds ? GEPNoWrapFlags::inBounds() : GEPNoWrapFlags::none();
+ GEP->setNoWrapFlags(NewGEPFlags);
// Lowers a GEP to either GEPs with a single index or arithmetic operations.
if (LowerGEP) {
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
index 877de38776839..91b5bc874c154 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
@@ -24,3 +24,26 @@ entry:
store float %3, ptr %arrayidx.dst, align 4
ret void
}
+
+; All offsets must be positive, so inbounds can be preserved.
+define void @must_be_inbounds(ptr %dst, ptr %src, i32 %i) {
+; CHECK-LABEL: @must_be_inbounds(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[I_PROM:%.*]] = zext i32 [[I:%.*]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds float, ptr [[SRC:%.*]], i64 [[I_PROM]]
+; CHECK-NEXT: [[ARRAYIDX_SRC2:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 4
+; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX_SRC2]], align 4
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[DST:%.*]], i64 [[I_PROM]]
+; CHECK-NEXT: [[ARRAYIDX_DST4:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT: store float [[TMP1]], ptr [[ARRAYIDX_DST4]], align 4
+; CHECK-NEXT: ret void
+;
+entry:
+ %i.prom = zext i32 %i to i64
+ %idx = add nsw i64 %i.prom, 1
+ %arrayidx.src = getelementptr inbounds float, ptr %src, i64 %idx
+ %3 = load float, ptr %arrayidx.src, align 4
+ %arrayidx.dst = getelementptr inbounds float, ptr %dst, i64 %idx
+ store float %3, ptr %arrayidx.dst, align 4
+ ret void
+}
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
index 9a73feb2c4b5c..4474585bf9b06 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
@@ -157,19 +157,19 @@ define void @sum_of_array3(i32 %x, i32 %y, ptr nocapture %output) {
; IR-NEXT: .preheader:
; IR-NEXT: [[TMP0:%.*]] = zext i32 [[Y]] to i64
; IR-NEXT: [[TMP1:%.*]] = zext i32 [[X]] to i64
-; IR-NEXT: [[TMP2:%.*]] = getelementptr [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
+; IR-NEXT: [[TMP2:%.*]] = getelementptr inbounds [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
; IR-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[TMP2]] to ptr
; IR-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
; IR-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], 0.000000e+00
-; IR-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 4
+; IR-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 4
; IR-NEXT: [[TMP7:%.*]] = addrspacecast ptr addrspace(3) [[TMP6]] to ptr
; IR-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP7]], align 4
; IR-NEXT: [[TMP9:%.*]] = fadd float [[TMP5]], [[TMP8]]
-; IR-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 128
+; IR-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 128
; IR-NEXT: [[TMP11:%.*]] = addrspacecast ptr addrspace(3) [[TMP10]] to ptr
; IR-NEXT: [[TMP12:%.*]] = load float, ptr [[TMP11]], align 4
; IR-NEXT: [[TMP13:%.*]] = fadd float [[TMP9]], [[TMP12]]
-; IR-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 132
+; IR-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 132
; IR-NEXT: [[TMP15:%.*]] = addrspacecast ptr addrspace(3) [[TMP14]] to ptr
; IR-NEXT: [[TMP16:%.*]] = load float, ptr [[TMP15]], align 4
; IR-NEXT: [[TMP17:%.*]] = fadd float [[TMP13]], [[TMP16]]
@@ -224,19 +224,19 @@ define void @sum_of_array4(i32 %x, i32 %y, ptr nocapture %output) {
; IR-NEXT: .preheader:
; IR-NEXT: [[TMP0:%.*]] = zext i32 [[Y]] to i64
; IR-NEXT: [[TMP1:%.*]] = zext i32 [[X]] to i64
-; IR-NEXT: [[TMP2:%.*]] = getelementptr [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
+; IR-NEXT: [[TMP2:%.*]] = getelementptr inbounds [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
; IR-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[TMP2]] to ptr
; IR-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
; IR-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], 0.000000e+00
-; IR-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 4
+; IR-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 4
; IR-NEXT: [[TMP7:%.*]] = addrspacecast ptr addrspace(3) [[TMP6]] to ptr
; IR-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP7]], align 4
; IR-NEXT: [[TMP9:%.*]] = fadd float [[TMP5]], [[TMP8]]
-; IR-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 128
+; IR-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 128
; IR-NEXT: [[TMP11:%.*]] = addrspacecast ptr addrspace(3) [[TMP10]] to ptr
; IR-NEXT: [[TMP12:%.*]] = load float, ptr [[TMP11]], align 4
; IR-NEXT: [[TMP13:%.*]] = fadd float [[TMP9]], [[TMP12]]
-; IR-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 132
+; IR-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 132
; IR-NEXT: [[TMP15:%.*]] = addrspacecast ptr addrspace(3) [[TMP14]] to ptr
; IR-NEXT: [[TMP16:%.*]] = load float, ptr [[TMP15]], align 4
; IR-NEXT: [[TMP17:%.*]] = fadd float [[TMP13]], [[TMP16]]
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
index 77b3434f4f159..da04a6e979425 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
@@ -372,8 +372,8 @@ define ptr @trunk_explicit(ptr %ptr, i64 %idx) {
; CHECK-LABEL: define ptr @trunk_explicit(
; CHECK-SAME: ptr [[PTR:%.*]], i64 [[IDX:%.*]]) {
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TMP0:%.*]] = getelementptr [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
-; CHECK-NEXT: [[PTR21:%.*]] = getelementptr i8, ptr [[TMP0]], i64 3216
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
+; CHECK-NEXT: [[PTR21:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 3216
; CHECK-NEXT: ret ptr [[PTR21]]
;
entry:
@@ -389,8 +389,8 @@ define ptr @trunk_long_idx(ptr %ptr, i64 %idx) {
; CHECK-LABEL: define ptr @trunk_long_idx(
; CHECK-SAME: ptr [[PTR:%.*]], i64 [[IDX:%.*]]) {
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TMP0:%.*]] = getelementptr [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
-; CHECK-NEXT: [[PTR21:%.*]] = getelementptr i8, ptr [[TMP0]], i64 3216
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
+; CHECK-NEXT: [[PTR21:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 3216
; CHECK-NEXT: ret ptr [[PTR21]]
;
entry:
|
@llvm/pr-subscribers-backend-nvptx Author: Fabian Ritter (ritter-x2a) ChangesIf we know that the initial GEP was inbounds, and we change it to a For SWDEV-516125. Full diff: https://github.com/llvm/llvm-project/pull/130617.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index 138a71ce79cef..070afdf0752f4 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -1052,6 +1052,8 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
}
}
+ bool MayRecoverInbounds = AccumulativeByteOffset >= 0 && GEP->isInBounds();
+
// Remove the constant offset in each sequential index. The resultant GEP
// computes the variadic base.
// Notice that we don't remove struct field indices here. If LowerGEP is
@@ -1079,6 +1081,8 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// and the old index if they are not used.
RecursivelyDeleteTriviallyDeadInstructions(UserChainTail);
RecursivelyDeleteTriviallyDeadInstructions(OldIdx);
+ MayRecoverInbounds =
+ MayRecoverInbounds && computeKnownBits(NewIdx, *DL).isNonNegative();
}
}
}
@@ -1100,11 +1104,15 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// address with silently-wrapping two's complement arithmetic".
// Therefore, the final code will be a semantically equivalent.
//
- // TODO(jingyue): do some range analysis to keep as many inbounds as
- // possible. GEPs with inbounds are more friendly to alias analysis.
- // TODO(gep_nowrap): Preserve nuw at least.
- auto NewGEPFlags = GEPNoWrapFlags::none();
- GEP->setNoWrapFlags(GEPNoWrapFlags::none());
+ // If the initial GEP was inbounds and all variable indices and the
+ // accumulated offsets are non-negative, they can be added in any order and
+ // the intermediate results are in bounds. So, we can preserve the inbounds
+ // flag for both GEPs. GEPs with inbounds are more friendly to alias analysis.
+ //
+ // TODO(gep_nowrap): Preserve nuw?
+ auto NewGEPFlags =
+ MayRecoverInbounds ? GEPNoWrapFlags::inBounds() : GEPNoWrapFlags::none();
+ GEP->setNoWrapFlags(NewGEPFlags);
// Lowers a GEP to either GEPs with a single index or arithmetic operations.
if (LowerGEP) {
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
index 877de38776839..91b5bc874c154 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
@@ -24,3 +24,26 @@ entry:
store float %3, ptr %arrayidx.dst, align 4
ret void
}
+
+; All offsets must be positive, so inbounds can be preserved.
+define void @must_be_inbounds(ptr %dst, ptr %src, i32 %i) {
+; CHECK-LABEL: @must_be_inbounds(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[I_PROM:%.*]] = zext i32 [[I:%.*]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds float, ptr [[SRC:%.*]], i64 [[I_PROM]]
+; CHECK-NEXT: [[ARRAYIDX_SRC2:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 4
+; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX_SRC2]], align 4
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[DST:%.*]], i64 [[I_PROM]]
+; CHECK-NEXT: [[ARRAYIDX_DST4:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT: store float [[TMP1]], ptr [[ARRAYIDX_DST4]], align 4
+; CHECK-NEXT: ret void
+;
+entry:
+ %i.prom = zext i32 %i to i64
+ %idx = add nsw i64 %i.prom, 1
+ %arrayidx.src = getelementptr inbounds float, ptr %src, i64 %idx
+ %3 = load float, ptr %arrayidx.src, align 4
+ %arrayidx.dst = getelementptr inbounds float, ptr %dst, i64 %idx
+ store float %3, ptr %arrayidx.dst, align 4
+ ret void
+}
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
index 9a73feb2c4b5c..4474585bf9b06 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
@@ -157,19 +157,19 @@ define void @sum_of_array3(i32 %x, i32 %y, ptr nocapture %output) {
; IR-NEXT: .preheader:
; IR-NEXT: [[TMP0:%.*]] = zext i32 [[Y]] to i64
; IR-NEXT: [[TMP1:%.*]] = zext i32 [[X]] to i64
-; IR-NEXT: [[TMP2:%.*]] = getelementptr [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
+; IR-NEXT: [[TMP2:%.*]] = getelementptr inbounds [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
; IR-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[TMP2]] to ptr
; IR-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
; IR-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], 0.000000e+00
-; IR-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 4
+; IR-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 4
; IR-NEXT: [[TMP7:%.*]] = addrspacecast ptr addrspace(3) [[TMP6]] to ptr
; IR-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP7]], align 4
; IR-NEXT: [[TMP9:%.*]] = fadd float [[TMP5]], [[TMP8]]
-; IR-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 128
+; IR-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 128
; IR-NEXT: [[TMP11:%.*]] = addrspacecast ptr addrspace(3) [[TMP10]] to ptr
; IR-NEXT: [[TMP12:%.*]] = load float, ptr [[TMP11]], align 4
; IR-NEXT: [[TMP13:%.*]] = fadd float [[TMP9]], [[TMP12]]
-; IR-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 132
+; IR-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 132
; IR-NEXT: [[TMP15:%.*]] = addrspacecast ptr addrspace(3) [[TMP14]] to ptr
; IR-NEXT: [[TMP16:%.*]] = load float, ptr [[TMP15]], align 4
; IR-NEXT: [[TMP17:%.*]] = fadd float [[TMP13]], [[TMP16]]
@@ -224,19 +224,19 @@ define void @sum_of_array4(i32 %x, i32 %y, ptr nocapture %output) {
; IR-NEXT: .preheader:
; IR-NEXT: [[TMP0:%.*]] = zext i32 [[Y]] to i64
; IR-NEXT: [[TMP1:%.*]] = zext i32 [[X]] to i64
-; IR-NEXT: [[TMP2:%.*]] = getelementptr [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
+; IR-NEXT: [[TMP2:%.*]] = getelementptr inbounds [32 x [32 x float]], ptr addrspace(3) @array, i64 0, i64 [[TMP1]], i64 [[TMP0]]
; IR-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[TMP2]] to ptr
; IR-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
; IR-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], 0.000000e+00
-; IR-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 4
+; IR-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 4
; IR-NEXT: [[TMP7:%.*]] = addrspacecast ptr addrspace(3) [[TMP6]] to ptr
; IR-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP7]], align 4
; IR-NEXT: [[TMP9:%.*]] = fadd float [[TMP5]], [[TMP8]]
-; IR-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 128
+; IR-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 128
; IR-NEXT: [[TMP11:%.*]] = addrspacecast ptr addrspace(3) [[TMP10]] to ptr
; IR-NEXT: [[TMP12:%.*]] = load float, ptr [[TMP11]], align 4
; IR-NEXT: [[TMP13:%.*]] = fadd float [[TMP9]], [[TMP12]]
-; IR-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr addrspace(3) [[TMP2]], i64 132
+; IR-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP2]], i64 132
; IR-NEXT: [[TMP15:%.*]] = addrspacecast ptr addrspace(3) [[TMP14]] to ptr
; IR-NEXT: [[TMP16:%.*]] = load float, ptr [[TMP15]], align 4
; IR-NEXT: [[TMP17:%.*]] = fadd float [[TMP13]], [[TMP16]]
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
index 77b3434f4f159..da04a6e979425 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll
@@ -372,8 +372,8 @@ define ptr @trunk_explicit(ptr %ptr, i64 %idx) {
; CHECK-LABEL: define ptr @trunk_explicit(
; CHECK-SAME: ptr [[PTR:%.*]], i64 [[IDX:%.*]]) {
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TMP0:%.*]] = getelementptr [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
-; CHECK-NEXT: [[PTR21:%.*]] = getelementptr i8, ptr [[TMP0]], i64 3216
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
+; CHECK-NEXT: [[PTR21:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 3216
; CHECK-NEXT: ret ptr [[PTR21]]
;
entry:
@@ -389,8 +389,8 @@ define ptr @trunk_long_idx(ptr %ptr, i64 %idx) {
; CHECK-LABEL: define ptr @trunk_long_idx(
; CHECK-SAME: ptr [[PTR:%.*]], i64 [[IDX:%.*]]) {
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TMP0:%.*]] = getelementptr [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
-; CHECK-NEXT: [[PTR21:%.*]] = getelementptr i8, ptr [[TMP0]], i64 3216
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[STRUCT0:%.*]], ptr [[PTR]], i64 0, i32 3, i64 [[IDX]], i32 1
+; CHECK-NEXT: [[PTR21:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 3216
; CHECK-NEXT: ret ptr [[PTR21]]
;
entry:
|
Started a llvm-compile-time-tracker run to check for compile time impact. |
llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
Outdated
Show resolved
Hide resolved
all within +/-0.02% (but I'm not sure if SeparateConstOffsetFromGEP even runs as part of these benchmarks). |
It doesn't run amdgpu anything so no |
a9bb606
to
3149661
Compare
54f5ec1
to
07a9924
Compare
dc04fdc
to
cd10720
Compare
07a9924
to
92a516e
Compare
4ab790d
to
5d1a65b
Compare
// the intermediate results are in bounds. So, we can preserve the inbounds | ||
// flag for both GEPs. GEPs with inbounds are more friendly to alias analysis. | ||
// | ||
// TODO(gep_nowrap): Preserve nuw? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this TODO is relevant. If the GEP is inbounds nuw and the add is nuw you can preserve inbounds nuw, without additional knowledge. Don't know whether that is valuable for your use case or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most recent commit preserves more flags based on nuw. All test cases in the updated preserve-inbounds.ll were validated by alive2 (no timeouts or memouts were hit, with -tv-smt-to=1500000 -tv-smt-max-mem=80000 --tv-disable-undef-input
).
5d1a65b
to
7af5dfd
Compare
40889cc
to
bf8391d
Compare
Ping. The PR now also makes use and preserves NUW flags. |
/// A helper function to check if reassociating through an entry in the user | ||
/// chain would invalidate the GEP's nuw flag. | ||
static bool allowsPreservingNUW(User *U) { | ||
assert(isa<BinaryOperator>(U) || isa<CastInst>(U) || isa<ConstantInt>(U)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ConstantInt can't be a user. I'd also remove this assert, and turn it into a conservatively correct return false at the end. It confusingly partially overlaps with the conditions you actually check for here. If you want the assert, put the unreachable at the end here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ConstantInt can't be a user.
This statement sounds reasonable, but the inheritance diagram disagrees: https://llvm.org/doxygen/classllvm_1_1ConstantInt.html
The assertion does fail if the last disjunct is removed.
Anyway, I'll look into turning it into a conservative return instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstantInt being a User is a wart, but that doesn't imply that constantInt can use an operator
@@ -778,17 +780,45 @@ Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) { | |||
return NewBO; | |||
} | |||
|
|||
/// A helper function to check if reassociating through an entry in the user | |||
/// chain would invalidate the GEP's nuw flag. | |||
static bool allowsPreservingNUW(User *U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static bool allowsPreservingNUW(User *U) { | |
static bool allowsPreservingNUW(const User &U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with adding the const (done), but I'd rather not use a reference here since that would complicate the dyn_cast
s in the callee and the all_of
with a SmallVector<User*,...>
at the call site.
…cking If we know that the initial GEP was inbounds, and we change it to a sequence of GEPs from the same base pointer where every offset is non-negative, then the new GEPs are inbounds. For SWDEV-516125.
1070abb
to
7148a36
Compare
If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.
We can also preserve inbounds if the inbounds GEP and the involved additions are NUW.
For SWDEV-516125.