add non-blocking PR benchmark guard in manifold.yml (base vs head perfTest)#1680
add non-blocking PR benchmark guard in manifold.yml (base vs head perfTest)#1680AnshulPatil2005 wants to merge 11 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1680 +/- ##
==========================================
+ Coverage 94.87% 95.05% +0.18%
==========================================
Files 36 37 +1
Lines 8305 8337 +32
==========================================
+ Hits 7879 7925 +46
+ Misses 426 412 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b963cd9 to
217463d
Compare
|
artifact upload uses if: always() so benchmark outputs are available even if earlier steps fail, if base/head data is invalid (e.g., different run counts, missing run files, or empty parsed samples), should we fail the benchmark_guard job? |
No, if it's only a warning when the perf drops, it should also only be a warning if these other things fail. Out of curiosity, where will these warnings show up? |
Got it, |
|
I switched the primary comparison metric from median of run means to mean of run means, since it should better reflect overall timing differences across repeated runs, while still keeping median of run means in the summary as a secondary context metric. |
|
Sounds reasonable - @pca006132 you've been doing more on the perf testing side of things, do you have time to review this? |
pca006132
left a comment
There was a problem hiding this comment.
I don't understand what mean you are talking about.
By run means, are you running each single benchmark multiple times and taking their mean? And mean of run means is taking the mean across benchmarks? If yes, I don't think you need the final mean, just compare the mean of each benchmark.
Yes you are understanding it right, honestly I was confused with the approach for the comparison ; I will do the each benchmark mean comparison |
pca006132
left a comment
There was a problem hiding this comment.
Can we use cached build from master? I remember GitHub action has some complicated cache setup, some may have security concerns, so we may need to be a bit careful about that.
Yes, good point. I am also finding a safe cache strategy for base/master builds to avoid cross-branch/cache-poisoning issues and other issues as you mentioned. This could help in other sanitizer lane pr as-well edit :this job currently takes around 4 minutes while the overall CI critical path is much longer so i dont think we should add a cache to this |
Summary
This PR adds an initial PR benchmark guard to CI for performance visibility during review.
It runs a lightweight benchmark comparison between:
on the same runner to reduce cross-runner noise.
working
wt-basefromgithub.event.pull_request.base.shawt-headfromgithub.event.pull_request.head.shaextras/perfTestfor each variant (base,head) with the same settings.benchmark used -> perfTest
PERF_REPEATS=3) and save outputs under:bench/base/run*.txtbench/head/run*.txtreporting
time = ... sec) from both sides.Compute per-run means and compare using mean of run means (primary); median of run means is also reported for context.
REGRESSION_WARN_PCT=20REGRESSION_WARN_ABS_MS=10GITHUB_STEP_SUMMARY.pr-benchmark-guard).expected follow-up
Calibrate thresholds and repeats using data.
Improve output accordingly if needed (case-level clarity, summary polish).
ref - opencax/GSoC#114 (comment)