A GCC plugin to dump the final layout of a struct and all types it references.
I started this project to support my Linux Kernel MicroPython port - I wanted to have an easy, Pythonic way to access kernel structures. That's why even the dump format is Python :)
This project consists of the GCC plugin itself under gcc_plugin and of Python heplers under python that
act as "data accessors" using the the plugin output.
You can use the plugin easily without the accessors though - it just was my specific purpose, the plugin is quite useful by itself.
Just hit make.
You can build in debug mode with make DEBUG=1; You'll get debugging information printed to stderr
(basically the internal GCC tree object of every field processed).
There's test_struct struct in tests/test_struct.c. This struct exploits many of the peculiarities allowed in
struct definitions. You can check it out, then hit make run to dump that weird struct, and see how different
fields ended up in the generated dump.
On a specific struct my_struct from a specific file myfile.c:
$ gcc -fplugin=./struct_layout.so -fplugin-arg-struct_layout-output=layout.txt -fplugin-arg-struct_layout-struct=my_struct myfile.c -cYou'll have your results in layout.txt.
You can omit -fplugin-arg-struct_layout-struct to dump all defined structs instead (all structs defined in your C
file, and all structs defined in all headers included)
Output is printed as Python objects, for easier handling later.
A dictionary is printed, with Struct objects created for each struct / union. There's no distinction between structs
and unions in this aspect - unions will simply have different offsets for their fields.
The object holds the name and size of the struct/union, plus a dictionary of the fields.
The dictionary maps field names to tuples of (offset, field type). For unions, the offset is always 0.
The objects & field types are defined in python/fields.py.
All types have a total_size attribute, with their total size in bits. Other
attributes vary between field types:
Scalar- scalars, they also have their basic type, likeintorcharorunsigned long intand a booleansignfield (Truesigned /Falseunsigned)Bitfield- used for bitfields, these have the number of bits they occupy and asignfield.StructField- struct/union fields, these have the struct name they are referencing. If the field is based on an anonymous struct, then itsStructobject itself is given.Pointer- for all types of pointers, these have their "pointee" type, which may be e.gScalaror anotherPointer.Void-voidtype, for example invoid *. This has size0.Function- pointee type in case of function pointers. This has size0.Array- for arrays, these have the number of elements and the type of each element ( similar to the pointee type ofPointer)
For example, the struct struct s { int x; unsigned char y; void *p; }; on my x86-64 evaluates to:
structs = {
's': Struct('s', 128, {
'x': (0, Scalar(32, 'int', True)),
'y': (32, Scalar(8, 'unsigned char', False)),
'p': (64, Pointer(64, Void())),
}),
}As I said, I originally intended this for Linux so it must be easy to generate the structs here :)
To generate for a specific struct:
$ python linux/dump_structs.py layout.txt --struct task_struct --header linux/sched.hYou can set the KDIR environment variable to run against a specific kernel tree (by default, runs against your local).
$ KDIR=/path/to/kernel python dump_struct.py ...To dump all structs (based on a set of headers I've collected in include_all.c) you can run:
$ python linux/dump_structs.py all.txtWhen including headers to dump their defined types, you may see some structs missing from the
output (although they are fully defined in the headers).
Apparently GCC doesn't complete the processing of structs that have only a typedef name until
they are used at least once (structs of the format typedef struct { ... } ..;).
I didn't verify it in GCC's code though.
Thus, the emitted event for finished types is not generated for them, and the plugin doesn't know of them.
A quick workaround for this problem: define a dummy, named struct referencing the types you want
in the dummy .c file you're handing to GCC.
Paired with the structs generated by the plugins, the accessors allow very convenient handling of structured data in Python code.
Basically you need to provide the base memory accessors (functions that access read/write a u8/u16/u32/u64 pointer)
and the accessors handle the rest (fields, pointers, arrays, bitfields, signedness, ...)
You can see how test_accessor.py does it.
This was tested on GCC 7.4.0, GCC 9.2.0, GCC 10.2.0. Oh, and Python 3, of course.
$ make test