Possible refactoring of InputSection hierarchy

Been thinking about how our InputSection hierarchy is not ideal, but I'm not ready to put a diff out for it just yet so here are some thoughts. Basically the problem is that we are conflating how sections are represented with how they are used

To recap, right now we have 3 types of InputSections.

1. `ConcatInputSection`s: As the name suggests, these are concatenated together for output (i.e. the "how it is used"). However they are also the only way to handle variable-size subsections that can contain relocations pointing to other sections...
2. `WordLiteralInputSection`s: These efficiently represent fixed-sized subsections that are some power of 2. However (as "Literal" implies) they cannot contain relocations. I.e. they must be leaf nodes in the graph of subsections.
3. `CStringInputSection`s: These represent variable-sized subsections that each contain a null-terminated string. They support string hash -> output offset lookups. These InputSections must also be leaf nodes.

Notably lacking is an InputSection that can efficiently represent fixed-sized subsections whilst being an internal node. This would be useful for `__objc_selrefs`, `__objc_classrefs`, and `__cfstring` (at least). Right now we represent them as ConcatInputSections, which is fine, just inefficient. Allocating a 128-byte ConcatInputSection for every 8 byte selector reference is not ideal.

At the same time, I'm wary of growing the complexity of the class hierarchy. Right now we `dyn_cast` InputSections in a number of places, but primarily we do this to distinguish ConcatInputSections from the other two, in order that we can correctly handle the fact that they are internal nodes. We don't usually need to dynamically distinguish between `WordLiteralInputSection` and `CStringInputSection`s. Similarly, I anticipate that we'll rarely have a need to distinguish between fixed-sized non-literal sections from the variable-sized ones.

All that said, here's what I think an ideal refactored class hierarchy would look like:
* InputSection gets renamed to InputSectionBase (mirroring LLD-ELF)
* ConcatInputSection gets renamed to `NonLiteralInputSection` (or just `InputSection` for short), extending InputSectionBase
* We then have `FixedSizeInputSection` and VariableSizedInputSection classes that extend our (NonLiteral)InputSection
* We also introduce a LiteralInputSection that extends InputSectionBase. That is further extended into `WordLiteralInputSection` (maybe `FixedSizeLiteralInputSection` would be a better name?) and `CStringInputSection`. I'm sure `FixedSizeLiteralInputSection` and `FixedSizeInputSection` would share a lot of implementation details, though, so we can factor out some of that into a `FixedSizeMixin` class.

I'm wondering if we should enable `dyn_cast`ing to every class in this hierarchy, or if we should only enable it for `(NonLiteral)InputSection` and `LiteralInputSection`. Too much manual type dispatch in the code is a nightmare, so we shouldn't make it too easy for people to write it. At the same time, having the LLVM RTTI not truly mirror the actual class hierarchy seems a bit sketchy...

All that said, this seems like a pretty big refactoring for an unclear perf win right now, so I'm inclined to punt on it. Just wanted to record my thoughts + gather feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible refactoring of InputSection hierarchy #57714

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible refactoring of InputSection hierarchy #57714

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions