Open
Description
We currently only offer minimal assistance in analysis on the comparison page:
- Noisy benchmarks are marked as such (though confusingly, it doesn't matter if one benchmark/incr-cache-state is noisy, the entire benchmark gets marked as noisy)
- Changes above +- 0.2% get marked in light green or red, and changes changes above +- 1.0% get marked in darker green or red.
Now that we have an automated understanding of benchmark noisiness, we can do a lot better. Here are some ideas:
- Flatten the benchmark data so that all benchmark/incr-cache-state pairs are shown in the same table without expanding
- By default, filter all "non-significant" changes. We need to decide what it means to be significant. We have various definitions.
- Make the green/red highlighting depend also on if the benchmark is noisy or not.
- Better expose how a particular benchmark is noisy (i.e., is the benchmark actually noisy or just highly variable)
- Show statistics of how many benchmarks showed significant changes in each direction.
- Show correlation between cache-state/profile and significant changes (i.e., if all the significant changes were in optimized builds, we should show this)