Skip to content

Add continuous benchmarking workflow#2385

Merged
ProfFan merged 1 commit intodevelopfrom
fan/benchmark_action
Feb 4, 2026
Merged

Add continuous benchmarking workflow#2385
ProfFan merged 1 commit intodevelopfrom
fan/benchmark_action

Conversation

@ProfFan
Copy link
Copy Markdown
Collaborator

@ProfFan ProfFan commented Feb 3, 2026

Summary

This PR adds a safe, two-tier benchmark pipeline for timeSFMBAL and enables PR performance reporting without using third-party benchmark actions. Note this will not work before merged (as the "workflow_dispatch" action can only be invoked when on the primary branch).

A demo can be seen at ProfFan#14

Sample:

timeSFMBAL benchmark

  • Head: b48c5f0896925232d9fb283a033c446272aecf5c
  • Base: 38f71129194d85b4d4c50a5109d1218948a2578b
Runner Metric Base (s) Head (s) Delta (s) Change
linux-arm64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky 2.068122 2.135078 +0.066957 +3.24%
linux-arm64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver 1.225473 1.254750 +0.029278 +2.39%
linux-x64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky 2.170932 2.222575 +0.051643 +2.38%
linux-x64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver 1.616237 1.639115 +0.022878 +1.42%
macos-arm64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky N/A 2.256323 N/A N/A
macos-arm64 timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver N/A 1.469075 N/A N/A

Missing base benchmark cache for: macos-arm64.

Worker runs

Role Runner SHA Conclusion
head linux-x64 b48c5f0896925232d9fb283a033c446272aecf5c success
base linux-x64 38f71129194d85b4d4c50a5109d1218948a2578b success
head linux-arm64 b48c5f0896925232d9fb283a033c446272aecf5c success
base linux-arm64 38f71129194d85b4d4c50a5109d1218948a2578b success
head macos-arm64 b48c5f0896925232d9fb283a033c446272aecf5c success
base macos-arm64 38f71129194d85b4d4c50a5109d1218948a2578b timed_out

What Changed

1) timeSFMBAL JSON benchmark output

  • Extended timing/timeSFMBAL.cpp with a benchmark JSON mode:
    • --benchmark-action-json <output_file>
  • JSON output contains per-metric entries for:
    • MultifrontalCholesky
    • MultifrontalSolver

2) New benchmark comparison script

  • Added .github/scripts/compare_time_sfmbal_benchmarks.py
  • Compares per-runner JSON results (OS/arch), computes deltas/percent change, and generates markdown suitable for PR comments.
  • Handles missing base data gracefully (reports N/A and missing base cache notes).

3) Safe split workflow design

  • Added unprivileged worker workflow:
    • .github/workflows/time-sfmbal-benchmark-runner.yml
    • Trigger: workflow_dispatch only
    • Permission: read-only contents
    • Runs benchmark for a specific runner/commit and caches JSON as:
      • timeSFMBAL-benchmark-v3-<os>-<arch>-<sha>
  • Updated orchestrator workflow:
    • .github/workflows/time-sfmbal-benchmark.yml
    • Triggers:
      • pull_request on opened
      • workflow_dispatch with pr_number
    • Dispatches worker runs for head/base across all configured runners, waits for completion, restores cached JSONs, generates markdown, and posts/updates PR comment.
  • /bench trigger workflow:
    • .github/workflows/time-sfmbal-benchmark-trigger.yml
    • On PR issue comment /bench, dispatches orchestrator.

4) Dataset handling in worker runs

  • Worker now uses BAL dataset archive:
    • https://grail.cs.washington.edu/projects/bal/data/dubrovnik/problem-16-22106-pre.txt.bz2
  • Flow:
    1. Restore problem-16-22106-pre.txt.bz2 from cache
    2. Download+cache on miss
    3. Extract into repo data dir as:
      • examples/Data/dubrovnik-16-22106-pre.txt

Why

  • Limits privileged operations to orchestration/commenting; benchmark execution remains unprivileged.
  • Improves reproducibility with explicit per-OS/arch/SHA benchmark caches.
  • Supports PR-open and manual /bench flows with consistent reporting.

@ProfFan ProfFan requested a review from dellaert February 3, 2026 17:48
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 3, 2026

timeSFMBAL benchmark

  • Head: 38f71129194d85b4d4c50a5109d1218948a2578b
  • Base: bf9913ad68056bce406daac6cdc883e3b0da83bd

No head benchmark results were found.

Worker runs

Role Runner SHA Conclusion

Copy link
Copy Markdown
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I didn’t do a deep dive review, but I figure it’s better to get it up and running and improve over time, rather than being a stickler for details right now.

What would be nice to add is to write down in some readme somewhere (and I have a markdown file in the .github directory for this) what is going on. The (AI?) PR comment is great, but will be forgotten, so maybe just adapt to the PR comment for that readme file?

@Gold856
Copy link
Copy Markdown
Contributor

Gold856 commented Feb 4, 2026

Pretty nifty! I'm curious though if/how this accounts for variability in GitHub Actions runners. We see that jobs can vary quite a bit in time to complete (even just the non-cached jobs), so I'm curious if we're going to end up measuring noise. Perhaps running them in the same runner instance would be better?

@ProfFan ProfFan merged commit 57bc310 into develop Feb 4, 2026
46 of 48 checks passed
@ProfFan ProfFan deleted the fan/benchmark_action branch February 4, 2026 00:55
@ProfFan
Copy link
Copy Markdown
Collaborator Author

ProfFan commented Feb 4, 2026

@Gold856 Runtime is mainly the building process and I believe the actual perf variance is like <5%

Also we can easily /bench again with the command.

@dellaert I'll add README in next PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants