Could we automatically detect/report performance regressions? #285
Description
Recently we've been surprised by multiple major performance regressions, and we're finding that it's very challenging to identify the root cause(s) after the fact. It would be really useful if yjit-metrics could help us automatically flag potential regressions.
I know this is not an easy or trivial thing to implement because there can often be false positives due to the inherent noise in measurements. I was thinking that since we have error bars on benchmarks, it might be possible to have a criteria such as, for example:
If the average time for the current run of a benchmark is below the previous average for the benchmark, and the error bars for both runs don't overlap, then flag a potential regression.
We could also have some kind of an adjustable threshold such that we allow a certain gap between the error bars before we report a regression (multiples of the largest or smallest of the stddevs? e.g. only report a slowdown if the gap between error bars is greater than 0.5 * min(stddev1, stddev2)
). We could tune this criteria to reduce the probability of false positives.
This system wouldn't be foolproof, it may not detect very small regressions, but I think that it could still be helpful because for example, if a microbenchmark suddently slows down by 5-10%, it would automatically get flagged. Currently we rarely ever look at our microbenchmarks, so these things can go completely undetected... But we could have a microbenchmark for object allocation, for example, and get automatically notified if object allocation takes a big drop. @XrXr
If one or more regressions are detected, a message could be posted in the benchmark CI slack channel, and the bot could do an @here
so that people in the channel are notified, or tag specific members of the YJIT team.