Skip to content

Possible refactoring of InputSection hierarchy #57714

Open
@int3

Description

@int3

Been thinking about how our InputSection hierarchy is not ideal, but I'm not ready to put a diff out for it just yet so here are some thoughts. Basically the problem is that we are conflating how sections are represented with how they are used

To recap, right now we have 3 types of InputSections.

  1. ConcatInputSections: As the name suggests, these are concatenated together for output (i.e. the "how it is used"). However they are also the only way to handle variable-size subsections that can contain relocations pointing to other sections...
  2. WordLiteralInputSections: These efficiently represent fixed-sized subsections that are some power of 2. However (as "Literal" implies) they cannot contain relocations. I.e. they must be leaf nodes in the graph of subsections.
  3. CStringInputSections: These represent variable-sized subsections that each contain a null-terminated string. They support string hash -> output offset lookups. These InputSections must also be leaf nodes.

Notably lacking is an InputSection that can efficiently represent fixed-sized subsections whilst being an internal node. This would be useful for __objc_selrefs, __objc_classrefs, and __cfstring (at least). Right now we represent them as ConcatInputSections, which is fine, just inefficient. Allocating a 128-byte ConcatInputSection for every 8 byte selector reference is not ideal.

At the same time, I'm wary of growing the complexity of the class hierarchy. Right now we dyn_cast InputSections in a number of places, but primarily we do this to distinguish ConcatInputSections from the other two, in order that we can correctly handle the fact that they are internal nodes. We don't usually need to dynamically distinguish between WordLiteralInputSection and CStringInputSections. Similarly, I anticipate that we'll rarely have a need to distinguish between fixed-sized non-literal sections from the variable-sized ones.

All that said, here's what I think an ideal refactored class hierarchy would look like:

  • InputSection gets renamed to InputSectionBase (mirroring LLD-ELF)
  • ConcatInputSection gets renamed to NonLiteralInputSection (or just InputSection for short), extending InputSectionBase
  • We then have FixedSizeInputSection and VariableSizedInputSection classes that extend our (NonLiteral)InputSection
  • We also introduce a LiteralInputSection that extends InputSectionBase. That is further extended into WordLiteralInputSection (maybe FixedSizeLiteralInputSection would be a better name?) and CStringInputSection. I'm sure FixedSizeLiteralInputSection and FixedSizeInputSection would share a lot of implementation details, though, so we can factor out some of that into a FixedSizeMixin class.

I'm wondering if we should enable dyn_casting to every class in this hierarchy, or if we should only enable it for (NonLiteral)InputSection and LiteralInputSection. Too much manual type dispatch in the code is a nightmare, so we shouldn't make it too easy for people to write it. At the same time, having the LLVM RTTI not truly mirror the actual class hierarchy seems a bit sketchy...

All that said, this seems like a pretty big refactoring for an unclear perf win right now, so I'm inclined to punt on it. Just wanted to record my thoughts + gather feedback

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions