Skip to content

add non-blocking PR benchmark guard in manifold.yml (base vs head perfTest)#1680

Open
AnshulPatil2005 wants to merge 11 commits into
elalish:masterfrom
AnshulPatil2005:ci/pr-benchmark-guard
Open

add non-blocking PR benchmark guard in manifold.yml (base vs head perfTest)#1680
AnshulPatil2005 wants to merge 11 commits into
elalish:masterfrom
AnshulPatil2005:ci/pr-benchmark-guard

Conversation

@AnshulPatil2005
Copy link
Copy Markdown
Contributor

@AnshulPatil2005 AnshulPatil2005 commented Apr 27, 2026

Summary

This PR adds an initial PR benchmark guard to CI for performance visibility during review.
It runs a lightweight benchmark comparison between:

  • the PR base commit, and
  • the PR head commit,
    on the same runner to reduce cross-runner noise.

working

  1. Checkout repository with full history.
  2. Create two git worktrees:
    • wt-base from github.event.pull_request.base.sha
    • wt-head from github.event.pull_request.head.sha
  3. Build and run extras/perfTest for each variant (base, head) with the same settings.
    benchmark used -> perfTest
  4. Repeat runs (PERF_REPEATS=3) and save outputs under:
    • bench/base/run*.txt
    • bench/head/run*.txt

reporting

  • Parse timing lines (time = ... sec) from both sides.
    Compute per-run means and compare using mean of run means (primary); median of run means is also reported for context.
  • Emit a warning when both thresholds are exceeded:
    • REGRESSION_WARN_PCT=20
    • REGRESSION_WARN_ABS_MS=10
  • Append markdown summary to GITHUB_STEP_SUMMARY.
  • Upload raw outputs and JSON summary as artifact (pr-benchmark-guard).

expected follow-up

Calibrate thresholds and repeats using data.
Improve output accordingly if needed (case-level clarity, summary polish).

ref - opencax/GSoC#114 (comment)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.05%. Comparing base (8b173db) to head (44e1489).
⚠️ Report is 34 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1680      +/-   ##
==========================================
+ Coverage   94.87%   95.05%   +0.18%     
==========================================
  Files          36       37       +1     
  Lines        8305     8337      +32     
==========================================
+ Hits         7879     7925      +46     
+ Misses        426      412      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AnshulPatil2005 AnshulPatil2005 force-pushed the ci/pr-benchmark-guard branch from b963cd9 to 217463d Compare May 2, 2026 14:43
@AnshulPatil2005
Copy link
Copy Markdown
Contributor Author

artifact upload uses if: always() so benchmark outputs are available even if earlier steps fail,
added an cleanup step to remove wt-base/wt-head worktrees after the job which was missing

if base/head data is invalid (e.g., different run counts, missing run files, or empty parsed samples), should we fail the benchmark_guard job?

@elalish
Copy link
Copy Markdown
Owner

elalish commented May 3, 2026

if base/head data is invalid (e.g., different run counts, missing run files, or empty parsed samples), should we fail the benchmark_guard job?

No, if it's only a warning when the perf drops, it should also only be a warning if these other things fail. Out of curiosity, where will these warnings show up?

@AnshulPatil2005
Copy link
Copy Markdown
Contributor Author

AnshulPatil2005 commented May 4, 2026

result.json
summary.md

if base/head data is invalid (e.g., different run counts, missing run files, or empty parsed samples), should we fail the benchmark_guard job?

No, if it's only a warning when the perf drops, it should also only be a warning if these other things fail. Out of curiosity, where will these warnings show up?

Got it,
For visibility: currently warnings show up as GitHub Actions in the benchmark_guard job logs (::warning::) and also in the uploaded bench artifacts (summary.md/result.json).
here is the example of the last output attached

@AnshulPatil2005 AnshulPatil2005 marked this pull request as ready for review May 24, 2026 13:23
@AnshulPatil2005
Copy link
Copy Markdown
Contributor Author

I switched the primary comparison metric from median of run means to mean of run means, since it should better reflect overall timing differences across repeated runs, while still keeping median of run means in the summary as a secondary context metric.
Am i going in the right direction here ?should there be any specific change ? @elalish @pca006132

@elalish
Copy link
Copy Markdown
Owner

elalish commented May 24, 2026

Sounds reasonable - @pca006132 you've been doing more on the perf testing side of things, do you have time to review this?

Copy link
Copy Markdown
Collaborator

@pca006132 pca006132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what mean you are talking about.

By run means, are you running each single benchmark multiple times and taking their mean? And mean of run means is taking the mean across benchmarks? If yes, I don't think you need the final mean, just compare the mean of each benchmark.

Comment thread .github/workflows/manifold.yml Outdated
Comment thread .github/workflows/manifold.yml Outdated
Comment thread .github/workflows/manifold.yml Outdated
@AnshulPatil2005
Copy link
Copy Markdown
Contributor Author

AnshulPatil2005 commented May 25, 2026

I don't understand what mean you are talking about.

By run means, are you running each single benchmark multiple times and taking their mean? And mean of run means is taking the mean across benchmarks? If yes, I don't think you need the final mean, just compare the mean of each benchmark.

Yes you are understanding it right, honestly I was confused with the approach for the comparison ; I will do the each benchmark mean comparison

Copy link
Copy Markdown
Collaborator

@pca006132 pca006132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cached build from master? I remember GitHub action has some complicated cache setup, some may have security concerns, so we may need to be a bit careful about that.

Comment thread .github/workflows/manifold.yml Outdated
Comment thread .github/workflows/manifold.yml Outdated
@AnshulPatil2005
Copy link
Copy Markdown
Contributor Author

AnshulPatil2005 commented May 28, 2026

Can we use cached build from master? I remember GitHub action has some complicated cache setup, some may have security concerns, so we may need to be a bit careful about that.

Yes, good point. I am also finding a safe cache strategy for base/master builds to avoid cross-branch/cache-poisoning issues and other issues as you mentioned.

This could help in other sanitizer lane pr as-well

edit :this job currently takes around 4 minutes while the overall CI critical path is much longer so i dont think we should add a cache to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants