-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][CodeGen] Add initial support for __cxa_rethrow #1290
base: main
Are you sure you want to change the base?
Conversation
Forgot to git add in cb0cb34
…lvm#971) The llvm's intrinsic `llvm.is.fpclass` is used to support multiple float point builtins: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-isfpclass > The `__builtin_isfpclass()` builtin is a generalization of functions > isnan, isinf, isfinite and some others defined by the C standard. It tests > if the floating-point value, specified by the first argument, falls into > any of data classes, specified by the second argument. I meant to support this by creating IntrinsicCallOp directly. But I can't make it due to llvm#480 since the return type of the intrinsic will mismatch. So I have to create a new Op for it. But I feel it might not be too bad. At least it is more explicit and more expressive.
Re llvm#958 > Consider the following code snippet `tmp.c`: > ``` > typedef struct { > int a, b; > } S; > > void foo(S s) {} > ``` > Running `bin/clang tmp.c -fclangir -Xclang -emit-llvm -Xclang -fclangir-call-conv-lowering -S -o -`, we get: > ``` > loc(fused["tmp.c":5:1, "tmp.c":5:16]): error: 'llvm.bitcast' op result #0 must be LLVM-compatible non-aggregate type, but got '!llvm.struct<"struct.S", (i32, i32)>' > ``` > We can also produce a similar error from this: > ``` > typedef struct { > int a, b; > } S; > > S init() { S s; return s; } > ``` > gives: > ``` > loc(fused["tmp.c":5:17, "tmp.c":5:24]): error: 'llvm.bitcast' op operand #0 must be LLVM-compatible non-aggregate type, but got '!llvm.struct<"struct.S", (i32, i32)>' > ``` > I've traced the errors back to `lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp` in `LowerFunction::buildAggregateStore`, `castReturnValue`, and `buildAggregateBitcast`. > > `withElementType(SrcTy)` is currently commented out/ignored in `LowerFunction.cpp`, but it is important. > > This PR adds/fixes this and updates one test. I thought [about it](llvm#958 (comment)) and I understand adding `cir.bitcast` to circumvent the CIR checks, but I am not sure how we can ignore/drop the bitcast while lowering. I think we can just make the CIR casts correct. I have added a number of lowering tests to verify that the CIR is lowered properly. cc: @sitio-couto @bcardosolopes.
…s for source code global and local vars (llvm#1001) Now CIR supports annotations for both globals and locals. They all should just use the same set of annotation related globals including file name string, annotation name string, and arguments. This PR makes sure this is the case. FYI: for the test case we have, OG generates [ code ](https://godbolt.org/z/Ev5Ycoqj1), pretty much the same code except annotation variable names. This would fix the crash like > error: redefinition of symbol named '.str.annotation' > fatal error: error in backend: The pass manager failed to lower CIR to LLVMIR dialect!
Previously we didn't generate the index for array correct. The previous test is incorrect already: globals-neg-index-array.c ``` // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir // RUN: FileCheck --input-file=%t.cir %s // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll // RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s // RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir // RUN: FileCheck --input-file=%t.cir %s // RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll // RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s struct __attribute__((packed)) PackedStruct { char a1; char a2; char a3; }; struct PackedStruct packed[10]; char *packed_element = &(packed[-2].a3); // CHECK: cir.global external @PACKED = #cir.zero : !cir.array<!ty_PackedStruct x 10> {alignment = 16 : i64} loc(#loc5) // CHECK: cir.global external @packed_element = #cir.global_view<@PACKED, [-2 : i32, 2 : i32]> // LLVM: @PACKED = global [10 x %struct.PackedStruct] zeroinitializer + // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED, i32 0, i32 -2, i32 2) - // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED, i32 -2, i32 2) ``` Compile it with `-fclangir -S`, we got: ``` packed: .zero 30 packed_element: .quad packed-54 ``` but the traditional pipeline shows (https://godbolt.org/z/eTj96EP1E): ``` packed: .zero 30 packed_element: .quad packed-4 ``` this may be a simple mismatch.
…lvm#1009) The MLIR docs at https://mlir.llvm.org/docs/DefiningDialects/Operations/#literals specify that "An empty literal `` may be used to remove a space that is inserted implicitly after certain literal elements", so I inserted one before the `right` literal to remove the extra space that was being printed. Oddly, the bug is also fixed by inserting an empty literal _after_ the `left` literal, which leads me to believe that tablegen is inserting an implicit space after the `left` literal.
Close llvm#522 This solves the issue we can't handle `case` in nested scopes and we can't handle if the switch body is not a compound statement. The core idea of the patch is to introduce the `cir.case` operation to the language. Then we can get the cases by traversing the body of the `cir.switch` operation easily instead of counting the regions and the attributes. Every `cir.case` operation has a region and now the `cir.switch` has only one region too. But to make the analysis and optimizations easier, I add a new concept `simple form` here. That a simple `cir.switch` operation is: all the `cir.case` operation owned by the `cir.switch` lives in the top level blocks of the `cir.switch` region and there is no other operations except the ending `cir.yield`. This solves the previous `simplified for common-case` vs `general solution` discussion in llvm#522. After implemented this, I feel the correct answer to it is, we want a general solution for constructing and lowering the operations but we like simple and common case for analysis and optimizations. We just mixed the different phases. For other semantics, see `CIROps.td`. For lowering, we can make it generally by lower the cases one by one and finally lower the switch itself. Although this patch has 1000+ lines of changes, I feel it is relatively neat especially it erases some odd behaviors before. Tested with Spec2017's C benchmarks except 500.perlbench_r.
…pes (llvm#1004) This PR adds a support for return values of a struct type. There are two cases that are not covered by this PR and will be added later.
…1005) This PR adds several copy-pasted lines and a small test and now var args seems to work in the calling convention pass
Reviewers: smeenai Reviewed By: smeenai Pull Request: llvm#1022
We diverge from CodeGen here by delaying the function emission that happens for a global variable. However, due to situations where a global can be emitted while building out a function the old CGF might not be invalid. So we need to store it here just in case. Reviewers: bcardosolopes, smeenai Reviewed By: smeenai Pull Request: llvm#1023
This was declared but never implemented. Upon first usage in a later commit this fails to link. Reviewers: bcardosolopes, smeenai Reviewed By: smeenai Pull Request: llvm#1024
…tion lowering pass (llvm#1003) This PR adds initial function pointers support for the calling convention lowering pass. This is a suggestion, so any other ideas are welcome. Several ideas was described in the llvm#995 and basically what I'm trying to do is to generate a clean CIR code without additional `bitcast` operations for function pointers and without mix of lowered and initial function types. #### Problem Looks like we can not just lower the function type and cast the value since too many operations are involved. For instance, for the next simple code: ``` typedef struct { int a; } S; typedef int (*myfptr)(S); int foo(S s) { return 42 + s.a; } void bar() { myfptr a = foo; } ``` we get the next CIR for the function `bar` , before the calling convention lowering pass: ``` cir.func no_proto @bar() extra(#fn_attr) { %0 = cir.alloca !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>, ["a", init] %1 = cir.get_global @foo : !cir.ptr<!cir.func<!s32i (!ty_S)>> cir.store %1, %0 : !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>> cir.return } ``` As one can see, first three operations depend on the function type. Once `foo` is lowered, we need to fix `GetGlobalOp`: otherwise the code will fail with the verification error since actual `foo` type (lowered) differs from the one currently expected by the `GetGlobalOp`. First idea would just rewrite only the `GetGlobalOp` and insert a bitcast after, so both `AllocaOp` and `StoreOp` would work witth proper types. Once the code will be more complex, we will need to take care about possible use cases, e.g. if we use arrays, we will need to track array accesses to it as well in order to insert this bitcast every time the array element is needed. One workaround I can think of: we fix the `GetGlobalOp` type and cast from the lowered type to the initial, and cast back before the actual call happens - but it doesn't sound as a good and clean approach (from my point of view, of course). So I suggest to use type converter and rewrite any operation that may deal with function pointers and make sure it has a proper type, and we don't have any unlowered function type in the program after the calling convention lowering pass. #### Implementation I added lowering for `AllocaOp`, `GetGlobalOp`, and split the lowering for `FuncOp` (former `CallConvLoweringPattern`) and lower `CallOp` separately. Frankly speaking, I tried to implement a pattern for each operation, but for some reasons the tests are not passed for windows and macOs in this case - something weird happens inside `applyPatternsAndFold` function. I suspect it's due to two different rewriters used - one in the `LoweringModule` and one in the mentioned function. So I decided to follow the same approach as it's done for the `LoweringPrepare` pass and don't involve this complex rewriting framework. Next I will add a type converter for the struct type, patterns for `ConstantOp` (for const arrays and `GlobalViewAttr`) In the end of the day we'll have (at least I hope so) a clean CIR code without any bitcasts for function pointers. cc @sitio-couto @bcardosolopes
We had some incorrect logic when creating functions and getting their address which resulted in spurious "definition with the same mangled name" errors. Fix that logic to match original CodeGen, which also fixes these errors. It's expected that decls can appear in the deferred decl list multiple times, and CodeGen has to guard against that. In the case that triggered the error, both `CIRGenerator::HandleInlineFunctionDefinition` and CIRGenModule were deferring the declaration. Something else I discovered here is that we emit these functions in the opposite order as regular CodeGen: https://godbolt.org/z/4PrKG7h9b. That might be a meaningful difference worth investigating further. Fixes llvm#991
…intrinsics (llvm#1020) In this PR, also changed `buildNeonShiftVector` to allow it generates negative shift values. When the shift value is negative, the shift amount vector is not used in any ShiftOp of IR (as they don't need sign to know shift direction), instead, it is just input argument to shift intrinsic function call.
…vm#1028) This PR fixes the notorious double whitespaces introduced by visibility attribute, for `cir.func` only. It uses "leading whitespace" for every print. And the printing of visibility attr is properly guarded by a check of `!isDefault()`. Double whitespaces in test files are removed.
This patch introduces support for the abs family of built-in functions (abs, labs, llabs).
Add lowering prepare logic to lower stores to cir.copy. This bring LLVM lowering closer to OG and turns out the rest of the compiler understands memcpys better and generate better assembly code for at least arm64 and x86_64. Note that current lowering to memcpy is only using i32 intrinsic version, this PR does not touch that code and that will be addressed in another PR.
…16` and `vdivh_f16` (llvm#1258) Lowering: - `vaddh_f16` - `vsubh_f16` - `vmulh_f16` - `vdivh_f16`
…1259) Lowering `vshld_n_s64` and `vshld_n_u64`
Lowering `vshrd_n_s64`
Lowering Neon `vset_lane_f64` References: [Clang CGBuiltin Implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L12342) [Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vset_lane_f64)
…llvm#1265) [Neon definiton](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vmaxv_s8) [OG implementation](https://github.com/llvm/clangir/blob/04d7dcfb2582753f3eccbf01ec900d60297cbf4b/clang/lib/CodeGen/CGBuiltin.cpp#L13202) Implementation in this PR is different from OG as 1. avoided code duplication by extracting out the common pattern 2. avoided using i32 as return type of the intrinsic call, so eliminated the need for casting result of the intrinsic call. This way of OG's implementation is quite unnecessary IMHO, this is MAX, not ADD or MUL. After all, using the expected type as return type of intrinsic call produces [the same ASM code](https://godbolt.org/z/3nKG7fxPb).
I continue to use `csmith` and catch run time bags. Now it's time to fix the layout for the const structs. There is a divergence between const structs generated by CIR and the original codegen. And this PR makes one more step to eliminate it. There are cases where the extra padding is required - and here is a fix for some of them. I did not write extra tests, since the fixes in the existing already covers the code I added. The point is that now the layout for all of these structs in the LLVM IR with and without CIR is the same.
Class `CIRGenFunction` contained three identical functions that converted from a Clang AST type (`clang::QualType`) to a ClangIR type (`mlir::Type`): `convertType`, `ConvertType`, and `getCIRType`. This embarrassment of duplication needed to be fixed, along with cleaning up other functions that convert from Clang types to ClangIR types. The three functions `CIRGenFunction::ConvertType`, `CIRGenFunction::convertType`, and `CIRGenFunction::getCIRType` were combined into a single function `CIRGenFunction::convertType`. Other functions were renamed as follows: - `CIRGenTypes::ConvertType` to `CIRGenTypes::convertType` - `CIRGenTypes::ConvertFunctionTypeInternal` to `CIRGenTypes::convertFunctionTypeInternal` - `CIRGenModule::getCIRType` to `CIRGenModule::convertType` - `ConstExprEmitter::ConvertType` to `ConstExprEmitter::convertType` - `ScalarExprEmitter::ConvertType` to `ScalarExprEmitter::convertType` Many cases of `getTypes().convertType(t)` and `getTypes().convertTypeForMem(t)` were changed to just `convertType(t)` and `convertTypeForMem(t)`, respectively, because the forwarding functions in `CIRGenModule` and `CIRGenFunction` make the explicit call to `getTypes()` unnecessary.
Reland previously reverted attempt now that this passes ASANified `ninja check-clang-cir`. Original message: We are missing cleanups all around, more incremental progress towards fixing that. This is supposed to be NFC intended, but we have to start changing some bits in order to properly match cleanup bits in OG. Start tagging places with more MissingFeatures to allow us to incrementally improve the situation.
…#1262) This patch adds support for the following GCC function attributes: - `__attribute__((const))` - `__attribute__((pure))` The side effect information is attached to the call operations during CIRGen. During LLVM lowering, these information is consumed to further emit appropriate LLVM metadata on LLVM call instructions.
This patch fixes the LLVM project tests workflow on Linux. Two changes were needed. Firstly, some commands need to be performed with sudo now that the container executes as a non-root user. Second, we needed to change from `ubuntu-latest` to `ubuntu-22.04` as `ubuntu-latest` not defaults to `ubuntu-24.04` which causes `setup-python` to install a python executable linked against a newer version of glibc that is not found on ubuntu 22.04, which causes failures when CMake cannot execute the python interpreter that it finds. (cherry picked from commit a759176)
…m#1249) C/C++ functions returning void had an explicit !cir.void return type while not having any returned value, which was breaking a lot of MLIR invariants when the CIR dialect is used in a greater context, for example with the inliner. Now, a C/C++ function returning void has no return type and no return values, which does not break the MLIR invariant about the same number of return types and returned values. This change does not keeps the same parsing/pretty-printed syntax as before for compatibility like in llvm#1203 because it requires some new features from the MLIR parser infrastructure itself, which is not great. This uses an optional type for function return type. The default MLIR parser for optional parameters requires an optional anchor we do not have in the syntax, so use a custom FuncType parser to handle the optional return type.
Some passes not declared with TableGen did not have descriptions.
This PR adds a support for for default arguments
…#1283) Corresponding [OG code](https://github.com/llvm/clangir/blob/ef20d053b3d78c9d4c135e2811b303b7e5016d30/clang/lib/CodeGen/CGExprConstant.cpp#L846). [OG generated code here](https://godbolt.org/z/x6q333dMn), one notable diff is we're missing `inrange` which is reported in [issue 886 ](llvm#886). For now, I'm still using GlobalViewAttr to implement it so we can move things fast. But it might be worth considering approach [Comments in issue 258](llvm#258), especially we could incoporate [inrange info](llvm#886) to the attribute suggested there.
This PR adds CIRGen and LLVM lowering support for the following language features related to pointers to data members: - Comparisons between pointers to data members. - Casting from pointers to data members to boolean. - Reinterpret casts between pointers to data members.
This PR updates the `#cir.global_view` attribute and make it accept integer types as its result type.
…vm#1280) Resolves llvm#1266 After change: ```llvm %1 = alloca ptr, i64 1, align 8 store i32 1, ptr @g_arr, align 4 store i32 2, ptr getelementptr (i32, ptr @g_arr, i64 1), align 4 store i32 3, ptr getelementptr (i32, ptr @g_arr, i64 2), align 4 %2 = load i32, ptr @g, align 4 store i32 %2, ptr getelementptr (i32, ptr @g_arr, i64 3), align 4 store ptr getelementptr (i32, ptr getelementptr (i32, ptr @g_arr, i64 3), i64 1), ptr %1, align 8 ```
This does not change anything in practice, work in that direction should come next. We also want this to not affect existing tests to isolate upcoming changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, some minor change needed.
There are quite a number of differences: phi in the CIR version VS loading from %exn.slot in the OG, having multiple landing pads, etc.
Yes, we use block arguments for the slots, and don't unique landing pads (yet). We might stay with the former but fix the later.
The CIR version still seems reasonable to me, mostly because currently we are unable to replicate the exact behavior of the OG codegen. Again, I am very open to more discussions and suggestions here)
I think it's similar enough!
// UnreachableOp in-place causes us to have two terminators which isn't | ||
// valid CIR. So, to make this valid CIR the UnreachableOp is created in a | ||
// separate scope. | ||
builder.create<cir::ScopeOp>(Fn.getLoc(), /*scopeBuilder=*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating the scope, you should just add the UnreachableOp directly and make sure during tryOp termination that it won't put a cir.yeld if there's already a terminator in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't as trivial as I thought. The case I have used in the test is easy (kind of) to handle this way.
Consider we have something like this:
void foo() {
int r = 1;
try {
throw;
S s;
} catch (...) {
++r;
}
}
where the throw
comes before the rest of the body of the try block, or even weirder cases with throw
and operations alternating in the try block.
What can we do in this case?) Do we ignore the remaining operations in the body of the try after inserting the first UnreachableOp
? (I think the OG does this, but its a bit tricky to replicate in CIR).
I tried to think of some workarounds, like using different blocks, but the scoping seemed to get the best results. I would appreciate some suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to eliminate unreachable blocks / code during CIRGen, best course of action is to create a new block after it and keep going. If for some reason it is too hard to pull that (which I'd like to see some explanations to justify), ignoring what comes next would be the solution (as long as it comes with a plan on how to tackle that incrementally thereafter_
We just went over a rebase against upstream, apologies for the churn but please update your branch for this PR and force-push! |
This PR adds an initial support for
__cxa_rethrow
, and one test that produces a rethrow.I am very open to suggestions regarding this PR, because I'm still a bit unsure if this replicates the original codegen properly. For example, using the test added, the OG CodeGen produces:
and the proposed CIR equivalent produces:
There are quite a number of differences:
phi
in the CIR version VS loading from%exn.slot
in the OG, having multiple landing pads, etc.The CIR version still seems reasonable to me, mostly because currently we are unable to replicate the exact behavior of the OG codegen. Again, I am very open to more discussions and suggestions here)