Skip to content

Implement initial regression detection #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 25, 2025
Merged

Conversation

rwstauner
Copy link
Contributor

@rwstauner rwstauner commented Jan 27, 2025

This sets up regression detection for the ratio_in_yjit metric using the idea of

Over the last 30-60 days, what was the lowest value? If we get a new value that’s 0.5 * stddev below the “floor” twice in a row, flag it?

I included the data that goes into the analysis to help us fine tune the algorithm and to make it easier to decide if a notification is worth investigating.

This is more or less what the slack notification would like:
image

refs #366

@rwstauner rwstauner self-assigned this Jan 27, 2025
@rwstauner rwstauner changed the title Implement regression detection Implement initial regression detection Jan 27, 2025
@maximecb
Copy link
Contributor

Nice! Code changes look relatively compact too 👌

It looks for streaks to identify stable values.

Neat! I don't fully understand how the streaks are computed though, is it some average +/- some delta percentage?

Small details wrt the slack notification:

When showing regressions, it would be nice to use the same capitalization formatting as the stat name eg ratio_in_yjit Might also be nice to show a percentage first e.g. -2.1%, from xxx% to yyy% (otherwise you kinda have to mentally calculate the percentage).

@rwstauner
Copy link
Contributor Author

This shows it would have caught the regression last Sep (#366):

$ bin/analysis --benchmarks=railsbench build/raw-benchmark-data.prod/raw_benchmark_data/x86_64/2024-09
ratio_in_yjit x86_64_yjit_stats
 railsbench streaks: [[99.7, 2], [99.81, 4], [99.7, 6], [99.48, 16]]
            highest_streak_value: 99.81
            longest_streak: [99.48, 16]
            geomean: 99.58991325387635
            regression: dropped 0.33% from 99.81 to 99.48
$ bin/analysis --benchmarks=railsbench build/raw-benchmark-data.prod/raw_benchmark_data/x86_64/2024-10
ratio_in_yjit x86_64_yjit_stats
 railsbench streaks: [[99.48, 13], [99.6, 1], [99.57, 2], [99.6, 14]]
            highest_streak_value: 99.6
            longest_streak: [99.6, 14]
            geomean: 99.54598300131984
$ bin/analysis --benchmarks=railsbench build/raw-benchmark-data.prod/raw_benchmark_data/x86_64/2024-11
ratio_in_yjit x86_64_yjit_stats
 railsbench streaks: [[99.6, 26], [99.82, 4]]
            highest_streak_value: 99.82
            longest_streak: [99.6, 26]
            geomean: 99.62930529510919

and in late November the number goes back up.

@rwstauner rwstauner force-pushed the rwstauner/regressions branch from dd429a6 to 73557a1 Compare February 18, 2025 16:28
Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let you decide when this should be merged and how it should be tested 👍

We now have data that is interesting besides the streaks.
This allows running the analysis for a finer grained result set than
just "per month" (having to specify the month's subfolder).
With stddev * 0.01 this command
  bin/analysis --before=2025-01-08 --benchmarks=railsbench,lobsters build/raw-benchmark-data.prod/raw_benchmark_data/x86_64
finds
  regression: 99.82 is 0.00% below mean 99.82
which looks silly.

With stddev * 0.02 that data set does not trigger.

This value does still register some from last September:
  bin/analysis --before=2024-09-17 --benchmarks=railsbench,lobsters build/raw-benchmark-data.prod/raw_benchmark_data/x86_64
ratio_in_yjit x86_64_yjit_stats
   lobsters regression: 98.91 is 0.21% below mean 99.12
 railsbench regression: 99.48 is 0.28% below mean 99.76
@rwstauner rwstauner merged commit 4b8e32e into main Feb 25, 2025
2 checks passed
@rwstauner rwstauner deleted the rwstauner/regressions branch February 25, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants