Description
Came up in #5935
ICU4X has test-c-tiny and test-js-tiny to show how far codesize can be optimized.
These are incremental, applying optimization on top of optimization to slowly reduce codesize. This shows a nice progression, but it is not helpful when understanding what the effect of each optimization is in isolation.
I think this is an important function of such a benchmark: many of these techniques are not uniformly available and impose additional constraints upon the build: some require nightly, some required paired Rust/Clang versions, some force build-std, some require a particular C compiler, some reduce debuggability, and so on.
Furthermore, a lot of these benchmarks build on top of each other: using a release build will of course help LTO be more effective (percent-wise).
Providing numbers for every combination is going to be a lot of work and likely an overwhelming amount of data. However, I think what we could do is identify a list of optimizations that are potentially relevant but not necessarily always possible, and then provide numbers for:
- plain release build
- release build with each of these optimizations individually applied
- for optimizations that build on each other; e.g.
-Clinker-plugin-lto
needs LTO, apply its dependencies too
- for optimizations that build on each other; e.g.
- release build with all but one of these optimizations applied
- similar setup for dependent optimizations: remove both
- release build with all optimizations applied
This would both give us an idea of the immediate wins of individual optimizations, and how they cumulatively work together.
The list of optimizations I can identify are:
- LTO (off, on, thin, it seems like thin gives us the best perf?)
-Clinker-plugin-lto
--gc-sections
--strip-all
- panic=abort
- panic-abort std
- panic-immediate-abort std
- panic-abort std
- one-step vs two-step clang
- use of lld (?)
- inclusion of debug symbols in the first place (same as strip? unclear)
This list might be larger than necessary, so we could merge some entries if desired. I might also be missing something. I didn't include Rust debug vs release here because I don't think debug build codesize numbers really mean much, and I can't think of a usecase for caring about those numbers.
Thoughts? @sffc