Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR] Lower nested local constant alloca #1261

Open
wants to merge 2,199 commits into
base: main
Choose a base branch
from

Conversation

Lancern
Copy link
Member

@Lancern Lancern commented Dec 29, 2024

This PR adds support for lowering local constants in nested scopes, including those in nested loops.

For those constant allocas in non-loop inner scopes, this PR keeps their constant flags during alloca hoisting. LLVM lowering would correctly emit necessary invariant metadata for those allocas.

For those constant allocas in a loop, this PR introduces a new operation cir.invariant_group that marks the beginning of the lifetime of the constant objects. This operation is put at the location of the alloca operation before hoisting them. This PR updates LLVM lowering to emit the necessary invariant metadata when loading and storing through such pointers.

This PR takes care of the special case where the constant alloca represents a variable declared in the condition part of a while loop. In such a case, this PR removes the constant flag on the alloca operation when hositing them.

ghehg and others added 30 commits November 22, 2024 18:31
…lvm#971)

The llvm's intrinsic `llvm.is.fpclass` is used to support multiple float
point builtins:
https://clang.llvm.org/docs/LanguageExtensions.html#builtin-isfpclass

> The `__builtin_isfpclass()` builtin is a generalization of functions
> isnan, isinf, isfinite and some others defined by the C standard. It
tests
> if the floating-point value, specified by the first argument, falls
into
> any of data classes, specified by the second argument.

I meant to support this by creating IntrinsicCallOp directly. But I
can't make it due to llvm#480 since
the return type of the intrinsic will mismatch. So I have to create a
new Op for it. But I feel it might not be too bad. At least it is more
explicit and more expressive.
Re llvm#958 

> Consider the following code snippet `tmp.c`: 
> ```
> typedef struct {
>   int a, b;
> } S;
> 
> void foo(S s) {}
> ```
> Running `bin/clang tmp.c -fclangir -Xclang -emit-llvm -Xclang
-fclangir-call-conv-lowering -S -o -`, we get:
> ```
> loc(fused["tmp.c":5:1, "tmp.c":5:16]): error: 'llvm.bitcast' op result
#0 must be LLVM-compatible non-aggregate type, but got
'!llvm.struct<"struct.S", (i32, i32)>'
> ```
> We can also produce a similar error from this: 
> ```
> typedef struct {
>   int a, b;
> } S;
> 
> S init() { S s; return s; }
> ```
> gives:
> ```
> loc(fused["tmp.c":5:17, "tmp.c":5:24]): error: 'llvm.bitcast' op
operand #0 must be LLVM-compatible non-aggregate type, but got
'!llvm.struct<"struct.S", (i32, i32)>'
> ```
> I've traced the errors back to
`lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp` in
`LowerFunction::buildAggregateStore`, `castReturnValue`, and
`buildAggregateBitcast`.
> 
> `withElementType(SrcTy)` is currently commented out/ignored in
`LowerFunction.cpp`, but it is important.
> 
> This PR adds/fixes this and updates one test.

I thought [about
it](llvm#958 (comment))
and I understand adding `cir.bitcast` to circumvent the CIR checks, but
I am not sure how we can ignore/drop the bitcast while lowering. I think
we can just make the CIR casts correct.

I have added a number of lowering tests to verify that the CIR is
lowered properly.

cc: @sitio-couto @bcardosolopes.
…s for source code global and local vars (llvm#1001)

Now CIR supports annotations for both globals and locals. They all
should just use the same set of annotation related globals including
file name string, annotation name string, and arguments. This PR makes
sure this is the case.

FYI: for the test case we have, OG generates [ code
](https://godbolt.org/z/Ev5Ycoqj1), pretty much the same code except
annotation variable names.
This would fix the crash like
> error: redefinition of symbol named '.str.annotation'
> fatal error: error in backend: The pass manager failed to lower CIR to
LLVMIR dialect!
Previously we didn't generate the index for array correct.

The previous test is incorrect already:  globals-neg-index-array.c

```
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir
// RUN: FileCheck --input-file=%t.cir %s
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll
// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s
// RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir
// RUN: FileCheck --input-file=%t.cir %s
// RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll
// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s

struct __attribute__((packed)) PackedStruct {
    char a1;
    char a2;
    char a3;
};
struct PackedStruct packed[10];
char *packed_element = &(packed[-2].a3);
// CHECK: cir.global  external @PACKED = #cir.zero : !cir.array<!ty_PackedStruct x 10> {alignment = 16 : i64} loc(#loc5)
// CHECK: cir.global  external @packed_element = #cir.global_view<@PACKED, [-2 : i32, 2 : i32]>
// LLVM: @PACKED = global [10 x %struct.PackedStruct] zeroinitializer
+ // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED, i32 0, i32 -2, i32 2)
- // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED,  i32 -2, i32 2)
```

Compile it with `-fclangir -S`, we got:

```
packed:
        .zero   30

packed_element:
	.quad	packed-54
```

but the traditional pipeline shows (https://godbolt.org/z/eTj96EP1E):

```
packed:
        .zero   30

packed_element:
        .quad   packed-4
```

this may be a simple mismatch.
…lvm#1009)

The MLIR docs at https://mlir.llvm.org/docs/DefiningDialects/Operations/#literals
specify that "An empty literal `` may be used to remove a space that is
inserted implicitly after certain literal elements", so I inserted one
before the `right` literal to remove the extra space that was being
printed. Oddly, the bug is also fixed by inserting an empty literal
_after_ the `left` literal, which leads me to believe that tablegen is
inserting an implicit space after the `left` literal.
Close llvm#522

This solves the issue we can't handle `case` in nested scopes and we
can't handle if the switch body is not a compound statement.

The core idea of the patch is to introduce the `cir.case` operation to
the language. Then we can get the cases by traversing the body of the
`cir.switch` operation easily instead of counting the regions and the
attributes.

Every `cir.case` operation has a region and now the `cir.switch` has
only one region too. But to make the analysis and optimizations easier,
I add a new concept `simple form` here. That a simple `cir.switch`
operation is: all the `cir.case` operation owned by the `cir.switch`
lives in the top level blocks of the `cir.switch` region and there is no
other operations except the ending `cir.yield`. This solves the previous
`simplified for common-case` vs `general solution` discussion in
llvm#522. After implemented this, I
feel the correct answer to it is, we want a general solution for
constructing and lowering the operations but we like simple and common
case for analysis and optimizations. We just mixed the different phases.

For other semantics, see `CIROps.td`.

For lowering, we can make it generally by lower the cases one by one and
finally lower the switch itself.

Although this patch has 1000+ lines of changes, I feel it is relatively
neat especially it erases some odd behaviors before.

Tested with Spec2017's C benchmarks except 500.perlbench_r.
…pes (llvm#1004)

This PR adds a support for return values of a struct type. There are two
cases that are not covered by this PR and will be added later.
…1005)

This PR adds several copy-pasted lines and a small test and now var args
seems to work in the calling convention pass
Reviewers: smeenai

Reviewed By: smeenai

Pull Request: llvm#1022
We diverge from CodeGen here by delaying the function emission that
happens for a global variable. However, due to situations where a global
can be emitted while building out a function the old CGF might not be
invalid. So we need to store it here just in case.

Reviewers: bcardosolopes, smeenai

Reviewed By: smeenai

Pull Request: llvm#1023
This was declared but never implemented. Upon first usage in a later
commit this fails to link.

Reviewers: bcardosolopes, smeenai

Reviewed By: smeenai

Pull Request: llvm#1024
…tion lowering pass (llvm#1003)

This PR adds initial function pointers support for the calling
convention lowering pass. This is a suggestion, so any other ideas are
welcome.

Several ideas was described in the llvm#995 and basically what I'm trying to
do is to generate a clean CIR code without additional `bitcast`
operations for function pointers and without mix of lowered and initial
function types.

#### Problem
Looks like we can not just lower the function type and cast the value
since too many operations are involved. For instance, for the
next simple code: 
```
typedef struct {
  int a;
} S;

typedef int (*myfptr)(S);

int foo(S s) { return 42 + s.a; }

void bar() {
  myfptr a = foo;
}
```
we get the next CIR for the function `bar` , before the calling
convention lowering pass:
```
cir.func no_proto  @bar() extra(#fn_attr) {
    %0 = cir.alloca !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>, ["a", init]
    %1 = cir.get_global @foo : !cir.ptr<!cir.func<!s32i (!ty_S)>> 
    cir.store %1, %0 : !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>
    cir.return 
  } 
```
As one can see, first three operations depend on the function type. Once
`foo` is lowered, we need to fix `GetGlobalOp`:
otherwise the code will fail with the verification error since actual
`foo` type (lowered) differs from the one currently expected by the
`GetGlobalOp`.
First idea would just rewrite only the `GetGlobalOp` and insert a
bitcast after, so both `AllocaOp` and `StoreOp` would work witth proper
types.

Once the code will be more complex, we will need to take care about
possible use cases, e.g. if we use arrays, we will need to track array
accesses to it as well in order to insert this bitcast every time the
array element is needed.

One workaround I can think of: we fix the `GetGlobalOp` type and cast
from the lowered type to the initial, and cast back before the actual
call happens - but it doesn't sound as a good and clean approach (from
my point of view, of course).

So I suggest to use type converter and rewrite any operation that may
deal with function pointers and make sure it has a proper type, and we
don't have any unlowered function type in the program after the calling
convention lowering pass.

#### Implementation
I added lowering for `AllocaOp`, `GetGlobalOp`, and split the lowering
for `FuncOp` (former `CallConvLoweringPattern`) and lower `CallOp`
separately.
Frankly speaking, I tried to implement a pattern for each operation, but
for some reasons the tests are not passed for
windows and macOs in this case - something weird happens inside
`applyPatternsAndFold` function. I suspect it's due to two different
rewriters used - one in the `LoweringModule` and one in the mentioned
function.
So I decided to follow the same approach as it's done for the
`LoweringPrepare` pass and don't involve this complex rewriting
framework.

Next I will add a type converter for the struct type, patterns for
`ConstantOp` (for const arrays and `GlobalViewAttr`)
In the end of the day we'll have (at least I hope so) a clean CIR code
without any bitcasts for function pointers.

cc @sitio-couto  @bcardosolopes
We had some incorrect logic when creating functions and getting their
address which resulted in spurious "definition with the same mangled
name" errors. Fix that logic to match original CodeGen, which also fixes
these errors.

It's expected that decls can appear in the deferred decl list multiple
times, and CodeGen has to guard against that. In the case that triggered
the error, both `CIRGenerator::HandleInlineFunctionDefinition` and
CIRGenModule were deferring the declaration.

Something else I discovered here is that we emit these functions in the
opposite order as regular CodeGen: https://godbolt.org/z/4PrKG7h9b.
That might be a meaningful difference worth investigating further.

Fixes llvm#991
…intrinsics (llvm#1020)

In this PR, also changed `buildNeonShiftVector` to allow it generates
negative shift values. When the shift value is negative, the shift
amount vector is not used in any ShiftOp of IR (as they don't need sign
to know shift direction), instead, it is just input argument to shift
intrinsic function call.
…vm#1028)

This PR fixes the notorious double whitespaces introduced by visibility
attribute, for `cir.func` only.

It uses "leading whitespace" for every print. And the printing of
visibility attr is properly guarded by a check of `!isDefault()`.

Double whitespaces in test files are removed.
AmrDeveloper and others added 26 commits January 6, 2025 07:08
…16` and `vdivh_f16` (llvm#1258)

Lowering:

- `vaddh_f16`
- `vsubh_f16`
- `vmulh_f16`
- `vdivh_f16`
…llvm#1265)

[Neon
definiton](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vmaxv_s8)
[OG
implementation](https://github.com/llvm/clangir/blob/04d7dcfb2582753f3eccbf01ec900d60297cbf4b/clang/lib/CodeGen/CGBuiltin.cpp#L13202)
Implementation in this PR is different from OG as 
1. avoided code duplication by extracting out the common pattern
2. avoided using i32 as return type of the intrinsic call, so eliminated
the need for casting result of the intrinsic call. This way of OG's
implementation is quite unnecessary IMHO, this is MAX, not ADD or MUL.
After all, using the expected type as return type of intrinsic call
produces [the same ASM code](https://godbolt.org/z/3nKG7fxPb).
I continue to use `csmith` and catch run time bags. Now it's time to fix
the layout for the const structs.

There is a divergence between const structs generated by CIR and the
original codegen. And this PR makes one more step to eliminate it. There
are cases where the extra padding is required - and here is a fix for
some of them. I did not write extra tests, since the fixes in the
existing already covers the code I added. The point is that now the
layout for all of these structs in the LLVM IR with and without CIR is
the same.
Class `CIRGenFunction` contained three identical functions that
converted from a Clang AST type (`clang::QualType`) to a ClangIR type
(`mlir::Type`): `convertType`, `ConvertType`, and `getCIRType`. This
embarrassment of duplication needed to be fixed, along with cleaning up
other functions that convert from Clang types to ClangIR types.

The three functions `CIRGenFunction::ConvertType`,
`CIRGenFunction::convertType`, and `CIRGenFunction::getCIRType` were
combined into a single function `CIRGenFunction::convertType`. Other
functions were renamed as follows:
- `CIRGenTypes::ConvertType` to `CIRGenTypes::convertType`
- `CIRGenTypes::ConvertFunctionTypeInternal` to
`CIRGenTypes::convertFunctionTypeInternal`
- `CIRGenModule::getCIRType` to `CIRGenModule::convertType`
- `ConstExprEmitter::ConvertType` to `ConstExprEmitter::convertType`
- `ScalarExprEmitter::ConvertType` to `ScalarExprEmitter::convertType`

Many cases of `getTypes().convertType(t)` and
`getTypes().convertTypeForMem(t)` were changed to just `convertType(t)`
and `convertTypeForMem(t)`, respectively, because the forwarding
functions in `CIRGenModule` and `CIRGenFunction` make the explicit call
to `getTypes()` unnecessary.
Reland previously reverted attempt now that this passes ASANified `ninja
check-clang-cir`.

Original message:
We are missing cleanups all around, more incremental progress towards fixing
that. This is supposed to be NFC intended, but we have to start changing some
bits in order to properly match cleanup bits in OG.

Start tagging places with more MissingFeatures to allow us to incrementally
improve the situation.
…#1262)

This patch adds support for the following GCC function attributes:

  - `__attribute__((const))`
  - `__attribute__((pure))`

The side effect information is attached to the call operations during
CIRGen. During LLVM lowering, these information is consumed to further
emit appropriate LLVM metadata on LLVM call instructions.
This patch fixes the LLVM project tests workflow on Linux. Two changes
were needed. Firstly, some commands need to be performed with sudo now
that the container executes as a non-root user. Second, we needed to
change from `ubuntu-latest` to `ubuntu-22.04` as `ubuntu-latest` not
defaults to `ubuntu-24.04` which causes `setup-python` to install a
python executable linked against a newer version of glibc that is not
found on ubuntu 22.04, which causes failures when CMake cannot execute
the python interpreter that it finds.

(cherry picked from commit a759176)
…m#1249)

C/C++ functions returning void had an explicit !cir.void return type
while not
having any returned value, which was breaking a lot of MLIR invariants
when the
CIR dialect is used in a greater context, for example with the inliner.

Now, a C/C++ function returning void has no return type and no return
values,
which does not break the MLIR invariant about the same number of return
types
and returned values.

This change does not keeps the same parsing/pretty-printed syntax as
before for
compatibility like in llvm#1203 because
it
requires some new features from the MLIR parser infrastructure itself,
which is
not great.

This uses an optional type for function return type.

The default MLIR parser for optional parameters requires an optional
anchor we
do not have in the syntax, so use a custom FuncType parser to handle the
optional
return type.
Some passes not declared with TableGen did not have descriptions.
This PR adds a support for for default arguments
…#1283)

Corresponding [OG
code](https://github.com/llvm/clangir/blob/ef20d053b3d78c9d4c135e2811b303b7e5016d30/clang/lib/CodeGen/CGExprConstant.cpp#L846).
[OG generated code here](https://godbolt.org/z/x6q333dMn), one notable
diff is we're missing `inrange` which is reported in [issue 886
](llvm#886).
For now, I'm still using GlobalViewAttr to implement it so we can move
things fast.
But it might be worth considering approach [Comments in issue
258](llvm#258), especially we could
incoporate [inrange info](llvm#886) to
the attribute suggested there.
This PR adds CIRGen and LLVM lowering support for the following language
features related to pointers to data members:

  - Comparisons between pointers to data members.
  - Casting from pointers to data members to boolean.
  - Reinterpret casts between pointers to data members.
This PR updates the `#cir.global_view` attribute and make it accept
integer types as its result type.
…vm#1280)

Resolves llvm#1266

After change:

```llvm
%1 = alloca ptr, i64 1, align 8
  store i32 1, ptr @g_arr, align 4
  store i32 2, ptr getelementptr (i32, ptr @g_arr, i64 1), align 4
  store i32 3, ptr getelementptr (i32, ptr @g_arr, i64 2), align 4
  %2 = load i32, ptr @g, align 4
  store i32 %2, ptr getelementptr (i32, ptr @g_arr, i64 3), align 4
  store ptr getelementptr (i32, ptr getelementptr (i32, ptr @g_arr, i64 3), i64 1), ptr %1, align 8
```
This does not change anything in practice, work in that direction should come
next. We also want this to not affect existing tests to isolate upcoming
changes.
This patch adds support for lowering local constants in nested scopes, including
those in nested loops.

For those constant allocas in non-loop inner scopes, this patch keeps their
constant flags during alloca hoisting. LLVM lowering would correctly emit
necessary invariant metadata for those allocas.

For those constant allocas in a loop, this patch introduces a new operation
`cir.invariant_group` that marks the beginning of the lifetime of the constant
objects. This operation is put at the location of the alloca operation before
hoisting them. This patch updates LLVM lowering to emit the necessary invariant
metadata when loading and storing through such pointers.

This patch takes care of the special case where the constant alloca represents
a variable declared in the condition part of a while loop. In such a case, this
patch removes the constant flag on the alloca operation when hositing them.
@Lancern Lancern force-pushed the nested-local-const-lowering branch from 0a13412 to 7a7d9cf Compare January 24, 2025 08:02
@Lancern
Copy link
Member Author

Lancern commented Jan 24, 2025

Rebased onto the latest main.

@bcardosolopes
Copy link
Member

Sorry I missed this, taking a closer look next week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.