Skip to content

Could we automatically detect/report performance regressions? #285

Open
@maximecb

Description

Recently we've been surprised by multiple major performance regressions, and we're finding that it's very challenging to identify the root cause(s) after the fact. It would be really useful if yjit-metrics could help us automatically flag potential regressions.

I know this is not an easy or trivial thing to implement because there can often be false positives due to the inherent noise in measurements. I was thinking that since we have error bars on benchmarks, it might be possible to have a criteria such as, for example:

If the average time for the current run of a benchmark is below the previous average for the benchmark, and the error bars for both runs don't overlap, then flag a potential regression.

We could also have some kind of an adjustable threshold such that we allow a certain gap between the error bars before we report a regression (multiples of the largest or smallest of the stddevs? e.g. only report a slowdown if the gap between error bars is greater than 0.5 * min(stddev1, stddev2)). We could tune this criteria to reduce the probability of false positives.

This system wouldn't be foolproof, it may not detect very small regressions, but I think that it could still be helpful because for example, if a microbenchmark suddently slows down by 5-10%, it would automatically get flagged. Currently we rarely ever look at our microbenchmarks, so these things can go completely undetected... But we could have a microbenchmark for object allocation, for example, and get automatically notified if object allocation takes a big drop. @XrXr

If one or more regressions are detected, a message could be posted in the benchmark CI slack channel, and the bot could do an @here so that people in the channel are notified, or tag specific members of the YJIT team.

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions