[CIR][CodeGen] Const structs with bitfields #412

gitoleg · 2024-01-22T13:51:54Z

This PR adds a support for const structs with bitfields.
Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though)

So .. what is all about.
First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is ConstantAggregateBuilder::addBits copied with minimum of changes.

The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code:

struct T {
  int X : 15;
  int Y : 6;
  unsigned Z : 9;
  int W;
};

struct T GV = { 1, 5, 256, -1};

is represented in LLVM IR (with no CIR enabled) as:

%struct.T = type { i32, i32 }
%struct.Inner = type { i8, i32 }

@GV = dso_local global { i8, i8, i8, i8, i32 } ...

i.e. the global var GV is looks like a struct of single bytes (up to the last field, which is not a btfield).
And my guess is that we want to have the same behavior in CIR. So we do.

The main problem is that we have to treat the same data differently - and this is why one additional bitcast is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

github-actions · 2024-01-22T13:54:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

bcardosolopes · 2024-01-30T01:05:06Z

@lanza just rebased and this has the side effect of breaking GH PR workflow, making it impossible to review, apologies. Please rebase!

gitoleg · 2024-01-31T09:08:35Z

@lanza just rebased and this has the side effect of breaking GH PR workflow, making it impossible to review, apologies. Please rebase!

@bcardosolopes done!

bcardosolopes · 2024-01-31T19:33:57Z

Thanks for continuing work on this area!

i.e. the global var GV is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do.

I haven't looked at the patch yet, going for some high level questions first. Is it possible to have something that looks sane in CIR out of CIRGen and only breaks down to something like this as part of LoweringPrepare?

The main problem is that we have to treat the same data differently - and this is why one additional bitcast is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

Gotcha, can you point me to the comment in question? I probably need to understand this better before giving you feedback

clang/lib/CIR/CodeGen/CIRGenExpr.cpp

gitoleg · 2024-02-01T14:22:38Z

Is it possible to have something that looks sane in CIR out of CIRGen and only breaks down to something like this as part of LoweringPrepare?

I will think about it

bcardosolopes · 2024-02-02T02:24:58Z

I will think about it

It's also fine if that comes as a follow up PR too!

gitoleg · 2024-02-02T15:48:53Z

@bcardosolopes
Well, so far I don't see any good way to postpone this annoying stuff until the lowering prepare phase. I'm not happy because of it either, and frankly speaking I have no idea why it's done this way in the original codegen, but anyway.

Previously we added CIR operations for bit fields, but the point is, the operations will be created after (in CIRGenExpr), i.e. we can't easily add something new to them here, in the CIRGenExprConst.
Also, I would say it will be hard to repeat the same bit manipulations in the lowering prepare, even if we could. Here we have a pretty straightforward approach - given a constant, we create a new struct type for it, and later create all the operations using this type. If we delayed it to the lowering prepare, we would go another way: check every constant, check it has bit fields, iterate over the whole program, find all get/set bitfield and get_member operations and replace them with the same ops but with using of the new type .. and the list of potentially affected operations may change in the future, if new ops will be add to CIR.

May be we could find another solution, but everything that comes to my head right now looks quite dirty.

So... what do you think?

bcardosolopes

Ok, this seems like a case that we should care about raising only to the point where we have a use case for this, complexity doesn't see worth without one. Last remaining piece is the assert I mentioned in previous review and I'll just merge it.

gitoleg · 2024-02-05T15:29:19Z

@bcardosolopes
added !

  if (VD->getTLSKind() != VarDecl::TLS_None)
    llvm_unrecheable("NYI");

bcardosolopes

LGTM

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in llvm#412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

@gv

This PR adds a support for const structs with bitfields. Now only global structs are supported, the support of the local ones can be added more or less easily - there is one ugly thing need to be done though) So .. what is all about. First of all - as usually, I'm sorry for the big PR. But it's hard to break it down to peaces. The good news is that in the same time it's a copy-pasta from the original codegen, no surprises here. Basically, the most hard place to read is `ConstantAggregateBuilder::addBits` copied with minimum of changes. The main problem - and frankly speaking I have no idea why it's done this way in the original codegen - is that the data layout is different for such structures, I mean literally another type is used. For instance, the code: ``` struct T { int X : 15; int Y : 6; unsigned Z : 9; int W; }; struct T GV = { 1, 5, 256, -1}; ``` is represented in LLVM IR (with no CIR enabled) as: ``` %struct.T = type { i32, i32 } %struct.Inner = type { i8, i32 } @gv = dso_local global { i8, i8, i8, i8, i32 } ... ``` i.e. the global var `GV` is looks like a struct of single bytes (up to the last field, which is not a btfield). And my guess is that we want to have the same behavior in CIR. So we do. The main problem is that we have to treat the same data differently - and this is why one additional `bitcast` is needed when we create a global var. Actually, there was a comment there - and I really wonder where it came from. But anyways, I don't really like this and don't see any good workaround here. Well, maybe we may add a kind of map in order to store the correspondence between types and do a bitcast more wisely. The same is true for the const structs with bitfields defined locally.

@bar

The second part of the job started in #412 , now about local structures. As it was mentioned previously, sometimes the layout for structures with bit fields inited with constants differ from the originally created in `CIRRecordLayoutBuilder` and it cause `storeOp` verification fail due to different structure type was used to allocation. This PR fix it. An example: ``` typedef struct { int a : 4; int b : 5; int c; } D; void bar () { D d = {1,2,3}; } ``` Well, I can't say I'm proud of these changes - it seems like a type safety violation, but looks like it's the best we can do here. The original codegen doesn't have this problem at all, there is just a `memcpy` there, I provide LLVM IR just for reference: ``` %struct.D = type { i16, i32 } @__const.bar.d = private unnamed_addr constant { i8, i8, i32 } { i8 33, i8 0, i32 3 }, align 4 ; Function Attrs: noinline nounwind optnone uwtable define dso_local void @bar() #0 { entry: %d = alloca %struct.D, align 4 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %d, ptr align 4 @__const.bar.d, i64 8, i1 false) ret void } ```

lanza force-pushed the main branch 2 times, most recently from 4e069c6 to 79d4dc7 Compare January 29, 2024 23:34

gitoleg added 18 commits January 31, 2024 11:50

WIVP where V stands for very

0e0c2bb

at least something works

2522d55

commented dbg

7bd8ed8

wip

8d43854

wip, added zero init in CIRRecordLayoutBuilder

0909c2e

removed debug info

ec4b430

strane fix, but it works

24d89de

adds todo-comment

7f12e3b

fix output struct type

94e62e0

wip

32c03b1

removes padding

ca4996a

fixes todo-s

316de49

adds bitcast for the case of globals

73f64e5

fix one more unreachable case

e5a8b56

updated comment in CIRGenModule

6f881b5

adds tests, fixes padding

f616b09

clang-format applied

e298b59

minor fix

c9e0238

gitoleg force-pushed the const-struct-bitfields branch from fca1a43 to c9e0238 Compare January 31, 2024 08:54

gitoleg commented Feb 1, 2024

View reviewed changes

clang/lib/CIR/CodeGen/CIRGenExpr.cpp Show resolved Hide resolved

merged upstream/main

aed5317

bcardosolopes approved these changes Feb 2, 2024

View reviewed changes

gitoleg added 2 commits February 5, 2024 18:15

added a missed check and NYIed it

022c410

Merge remote-tracking branch 'upstream/main' into const-struct-bitfields

559a39f

bcardosolopes approved these changes Feb 5, 2024

View reviewed changes

bcardosolopes merged commit fdf178a into llvm:main Feb 5, 2024

gitoleg mentioned this pull request Feb 14, 2024

[CIR][CodeGen] Locally inited structures with bitfields #463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][CodeGen] Const structs with bitfields #412

[CIR][CodeGen] Const structs with bitfields #412

Uh oh!

gitoleg commented Jan 22, 2024

Uh oh!

github-actions bot commented Jan 22, 2024 •

edited

Loading

Uh oh!

bcardosolopes commented Jan 30, 2024

Uh oh!

gitoleg commented Jan 31, 2024

Uh oh!

bcardosolopes commented Jan 31, 2024

Uh oh!

Uh oh!

gitoleg commented Feb 1, 2024

Uh oh!

bcardosolopes commented Feb 2, 2024

Uh oh!

gitoleg commented Feb 2, 2024

Uh oh!

bcardosolopes left a comment

Uh oh!

gitoleg commented Feb 5, 2024

Uh oh!

bcardosolopes left a comment

Uh oh!

Uh oh!

[CIR][CodeGen] Const structs with bitfields #412

[CIR][CodeGen] Const structs with bitfields #412

Uh oh!

Conversation

gitoleg commented Jan 22, 2024

Uh oh!

github-actions bot commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcardosolopes commented Jan 30, 2024

Uh oh!

gitoleg commented Jan 31, 2024

Uh oh!

bcardosolopes commented Jan 31, 2024

Uh oh!

Uh oh!

gitoleg commented Feb 1, 2024

Uh oh!

bcardosolopes commented Feb 2, 2024

Uh oh!

gitoleg commented Feb 2, 2024

Uh oh!

bcardosolopes left a comment

Choose a reason for hiding this comment

Uh oh!

gitoleg commented Feb 5, 2024

Uh oh!

bcardosolopes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2024 •

edited

Loading