Optimized codesize benchmarks do not clearly show the power of individual optimizations

Came up in https://github.com/unicode-org/icu4x/issues/5935

ICU4X has test-c-tiny and test-js-tiny to show how far codesize can be optimized.

These are incremental, applying optimization on top of optimization to slowly reduce codesize. This shows a nice progression, but it is not helpful when understanding what the effect of each optimization is in isolation.

I think this is an important function of such a benchmark: many of these techniques are not uniformly available and impose additional constraints upon the build: some require nightly, some required paired Rust/Clang versions, some force build-std, some require a particular C compiler, some reduce debuggability, and so on.

Furthermore, a lot of these benchmarks build on top of each other: using a release build will of course help LTO be more effective (percent-wise).

Providing numbers for every combination is going to be a lot of work and likely an overwhelming amount of data. However, I think what we could do is identify a list of optimizations that are _potentially_ relevant but not necessarily always possible, and then provide numbers for:

 - plain release build
 - release build with each of these optimizations individually applied
   - for optimizations that build on each other; e.g. `-Clinker-plugin-lto` needs LTO, apply its dependencies too
 - release build with _all but one_ of these optimizations applied
   - similar setup for dependent optimizations: remove both
 - release build with all optimizations applied


This would both give us an idea of the immediate wins of individual optimizations, and how they cumulatively work together.

The list of optimizations I can identify are:


 - LTO (off, on, thin, it seems like thin gives us the best perf?)
   - `-Clinker-plugin-lto`
 - `--gc-sections`
    - `--strip-all` 
 - panic=abort
   - panic-abort std
     - panic-immediate-abort std
 - one-step vs two-step clang
 - use of lld (?)
 - inclusion of debug symbols in the first place (same as strip? unclear)

This list might be larger than necessary, so we could merge some entries if desired. I might also be missing something. I didn't include Rust debug vs release here because I don't think debug build codesize numbers really mean much, and I can't think of a usecase for caring about those numbers.

Thoughts? @sffc 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions