Embedding R_RISCV_RELAX to another relocations

Object files for RISC-V contain lots of `R_RISCV_RELAX` relocations, and the number seems astonishing. If my calculation is correct, if I build the mold linker as a release binary for RISC-V, `R_RISCV_RELAX` relocation records occupy more than 10% of the total input object files (the input object files are in total 132MiB, and they contains 615,542 `R_RISCV_RELAX` relocations, so the relocations occupy 615,542 * 24 = 14,773,008 bytes.) That's way more than what I expected.

(Can someone verify it with your binary? I believe my calculation is correct but still can't completely believe it because it's too big.)

We do care about object file size. Large object files don't only take up more disk space but also slows down build system because of slow IO.

`R_RISCV_RELAX` essentially conveys a single bit information that the relocation pointing to the same place is relaxable. But on disk, each `R_RISCV_RELAX` occupies 24 _bytes_ of space on RV64 because it's represented as an independent relocation record. That's 192x data bloat. It feels we should do something to improve the situation. Do you all have any ideas on how to improve it?

As a starter, I'd like to propose the following scheme to embed the `R_RISCV_RELAX` bit to relocation record.

**Proposal**

RV64's relocation record looks like this:

```
struct {
  u64 r_offset;
  u32 r_type;
  u32 r_sym;
  i64 r_addend;
};
```

where `r_type` represents the relocation type. The maximum relocation type is currently limited to 256, and with all possible future extensions, it is hard to imagine that we'd need more than 2^16 distinctive relocation types (I believe we'll never have more than 1024 relocation types, but I'm erring on the side of caution.) That gives us the opportunity to redefine the upper half of the `r_type` record as follows:

```
struct {
  u64 r_offset;
  u16 r_type;
  u16 reserved : 15;
  u16 relaxable : 1;
  u32 r_sym;
  i64 r_addend;
};
```

`relaxable` bit is turned on if a relocation is relaxable. In this scheme, we can embed all `R_RISCV_RELAX` relocations to adjacent relocations.

If an object file containing the new relocation records is consumed by a tool that doesn't understand the new format, the tool would report an "unknown relocation type" error because the `relaxable` bit would be interpreted as part of `r_type`. So we need to implement it first to the linker and other ELF-consuming tools, wait for a few years and then turn it on for the assembler/compiler.

There are a few precedences to use the `r_type` records in a way similar to this. For example, SPARC stores a 24 bit immediate value in the most significant 24 bits of `r_type` if the relocation type is `R_SPARC_OLO10`. MIPS allows a single relocation records to contain up to 3 relocation types. In binutils, such relocations are first preprocessed and represented as two or three relocation records internally.

So, what do you think about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embedding R_RISCV_RELAX to another relocations #401

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Embedding R_RISCV_RELAX to another relocations #401

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions