Skip to content

Commit 03e6535

Browse files
[CIR][ABI][AArch64][Lowering] Fix calls for struct types > 128 bits (#1335)
In [PR#1074](#1074) we introduced calls for struct types > 128 bits, but there's is an issue here. [This](https://github.com/llvm/clangir/blob/3e17e7b9404e1a28bf33bdd5943f4a208134d479/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp#L1169) is meant to be a `memcpy` of the alloca instead of directly passing the alloca, just like in the [OG](https://github.com/llvm/clangir/blob/3e17e7b9404e1a28bf33bdd5943f4a208134d479/clang/lib/CodeGen/CGCall.cpp#L5323). The PR was meant to use a `memcpy` and later handle cases where we don't need the `memcpy`. For example, running the following code snippet `tmp.c` using `bin/clang tmp.c -o tmp -Xclang -fclangir -Xclang -fclangir-call-conv-lowering --target=aarch64-none-linux-gnu`: ``` #include <stdio.h> typedef struct { int a, b, c, d, e; } S; void change(S s) { s.a = 10; } void foo(void) { S s; s.a = 9; change(s); printf("%d\n", s.a); } int main(void) { foo(); return 0; } ``` gives 10 instead of 9, because we pass the pointer instead of a copy. Relevant part of the OG LLVM output: ``` @foo() %s = alloca %struct.S, align 4 %byval-temp = alloca %struct.S, align 4 %a = getelementptr inbounds nuw %struct.S, ptr %s, i32 0, i32 0 store i32 9, ptr %a, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %byval-temp, ptr align 4 %s, i64 20, i1 false) call void @change(ptr noundef %byval-temp) ``` Current LLVM output through CIR: ``` @foo() %1 = alloca %struct.S, i64 1, align 4 %2 = getelementptr %struct.S, ptr %1, i32 0, i32 0 store i32 9, ptr %2, align 4 %3 = load %struct.S, ptr %1, align 4 call void @change(ptr %1) ``` So, there should be a memcpy. This PR fixes this, and adds a comment/note for the future cases where we need to check if the copy is not needed. I have also updated the old test with structs having size > 128.
1 parent 6dd6d82 commit 03e6535

File tree

2 files changed

+21
-11
lines changed

2 files changed

+21
-11
lines changed

clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp

+7-1
Original file line numberDiff line numberDiff line change
@@ -1166,7 +1166,13 @@ mlir::Value LowerFunction::rewriteCallOp(const LowerFunctionInfo &CallInfo,
11661166
if (::cir::MissingFeatures::undef())
11671167
cir_cconv_unreachable("NYI");
11681168

1169-
IRCallArgs[FirstIRArg] = alloca;
1169+
// TODO(cir): add check for cases where we don't need the memcpy
1170+
auto tmpAlloca = createTmpAlloca(
1171+
*this, alloca.getLoc(),
1172+
mlir::cast<PointerType>(alloca.getType()).getPointee());
1173+
auto tySize = LM.getDataLayout().getTypeAllocSize(I->getType());
1174+
createMemCpy(*this, tmpAlloca, alloca, tySize.getFixedValue());
1175+
IRCallArgs[FirstIRArg] = tmpAlloca;
11701176

11711177
// NOTE(cir): Skipping Emissions, lifetime markers.
11721178

clang/test/CIR/CallConvLowering/AArch64/aarch64-cc-structs.c

+14-10
Original file line numberDiff line numberDiff line change
@@ -171,20 +171,24 @@ GT_128 get_gt_128(GT_128 s) {
171171
}
172172

173173
// CHECK: cir.func no_proto @call_and_get_gt_128(%arg0: !cir.ptr<!ty_GT_128_>
174-
// CHECK: %[[#V0:]] = cir.alloca !ty_GT_128_, !cir.ptr<!ty_GT_128_>, {{.*}} {alignment = 8 : i64}
175-
// CHECK: %[[#V1:]] = cir.alloca !ty_GT_128_, !cir.ptr<!ty_GT_128_>, {{.*}} {alignment = 8 : i64}
176-
// CHECK: cir.call @get_gt_128(%[[#V1]], %arg0) : (!cir.ptr<!ty_GT_128_>, !cir.ptr<!ty_GT_128_>) -> ()
177-
// CHECK: %[[#V2:]] = cir.load %[[#V1]] : !cir.ptr<!ty_GT_128_>, !ty_GT_128_
178-
// CHECK: cir.store %[[#V2]], %[[#V0]] : !ty_GT_128_, !cir.ptr<!ty_GT_128_>
174+
// CHECK: %[[#V0:]] = cir.alloca !ty_GT_128_, !cir.ptr<!ty_GT_128_>, ["tmp"] {alignment = 8 : i64}
175+
// CHECK: %[[#V1:]] = cir.load %arg0 : !cir.ptr<!ty_GT_128_>, !ty_GT_128_
176+
// CHECK: %[[#V2:]] = cir.alloca !ty_GT_128_, !cir.ptr<!ty_GT_128_>, [""] {alignment = 8 : i64}
177+
// CHECK: %[[#V3:]] = cir.alloca !ty_GT_128_, !cir.ptr<!ty_GT_128_>, ["tmp"] {alignment = 8 : i64}
178+
// CHECK: %[[#V4:]] = cir.cast(bitcast, %arg0 : !cir.ptr<!ty_GT_128_>), !cir.ptr<!void>
179+
// CHECK: %[[#V5:]] = cir.cast(bitcast, %[[#V3]] : !cir.ptr<!ty_GT_128_>), !cir.ptr<!void>
180+
// CHECK: %[[#V6:]] = cir.const #cir.int<24> : !u64i
181+
// CHECK: cir.libc.memcpy %[[#V6]] bytes from %[[#V4]] to %[[#V5]] : !u64i, !cir.ptr<!void> -> !cir.ptr<!void>
182+
// CHECK: cir.call @get_gt_128(%[[#V2]], %[[#V3]]) : (!cir.ptr<!ty_GT_128_>, !cir.ptr<!ty_GT_128_>) -> ()
179183
// CHECK: cir.return
180184

181185
// LLVM: void @call_and_get_gt_128(ptr %[[#V0:]])
182186
// LLVM: %[[#V2:]] = alloca %struct.GT_128, i64 1, align 8
183-
// LLVM: %[[#V3:]] = alloca %struct.GT_128, i64 1, align 8
184-
// LLVM: call void @get_gt_128(ptr %[[#V3]], ptr %[[#V0]])
185-
// LLVM: %[[#V4:]] = load %struct.GT_128, ptr %[[#V3]], align 8
186-
// LLVM: store %struct.GT_128 %[[#V4]], ptr %[[#V2]], align 8
187-
// LLVM: ret void
187+
// LLVM: %[[#V3:]] = load %struct.GT_128, ptr %[[#V0]], align 8
188+
// LLVM: %[[#V4:]] = alloca %struct.GT_128, i64 1, align 8
189+
// LLVM: %[[#V5:]] = alloca %struct.GT_128, i64 1, align 8
190+
// LLVM: call void @llvm.memcpy.p0.p0.i64(ptr %[[#V5]], ptr %[[#V0]], i64 24, i1 false)
191+
// LLVM: call void @get_gt_128(ptr %[[#V4]], ptr %[[#V5]])
188192
GT_128 call_and_get_gt_128() {
189193
GT_128 s;
190194
s = get_gt_128(s);

0 commit comments

Comments
 (0)