Add continuous benchmarking workflow by ProfFan · Pull Request #2385 · borglab/gtsam

ProfFan · 2026-02-03T17:48:49Z

Summary

This PR adds a safe, two-tier benchmark pipeline for timeSFMBAL and enables PR performance reporting without using third-party benchmark actions. Note this will not work before merged (as the "workflow_dispatch" action can only be invoked when on the primary branch).

A demo can be seen at ProfFan#14

Sample:

timeSFMBAL benchmark

Head: b48c5f0896925232d9fb283a033c446272aecf5c
Base: 38f71129194d85b4d4c50a5109d1218948a2578b

Runner	Metric	Base (s)	Head (s)	Delta (s)	Change
linux-arm64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky`	2.068122	2.135078	+0.066957	+3.24%
linux-arm64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver`	1.225473	1.254750	+0.029278	+2.39%
linux-x64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky`	2.170932	2.222575	+0.051643	+2.38%
linux-x64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver`	1.616237	1.639115	+0.022878	+1.42%
macos-arm64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalCholesky`	N/A	2.256323	N/A	N/A
macos-arm64	`timeSFMBAL/dubrovnik-16-22106-pre.txt/MultifrontalSolver`	N/A	1.469075	N/A	N/A

Missing base benchmark cache for: macos-arm64.

Worker runs

Role	Runner	SHA	Conclusion
head	linux-x64	`b48c5f0896925232d9fb283a033c446272aecf5c`	success
base	linux-x64	`38f71129194d85b4d4c50a5109d1218948a2578b`	success
head	linux-arm64	`b48c5f0896925232d9fb283a033c446272aecf5c`	success
base	linux-arm64	`38f71129194d85b4d4c50a5109d1218948a2578b`	success
head	macos-arm64	`b48c5f0896925232d9fb283a033c446272aecf5c`	success
base	macos-arm64	`38f71129194d85b4d4c50a5109d1218948a2578b`	timed_out

What Changed

1) `timeSFMBAL` JSON benchmark output

Extended timing/timeSFMBAL.cpp with a benchmark JSON mode:
- --benchmark-action-json <output_file>
JSON output contains per-metric entries for:
- MultifrontalCholesky
- MultifrontalSolver

2) New benchmark comparison script

Added .github/scripts/compare_time_sfmbal_benchmarks.py
Compares per-runner JSON results (OS/arch), computes deltas/percent change, and generates markdown suitable for PR comments.
Handles missing base data gracefully (reports N/A and missing base cache notes).

3) Safe split workflow design

Added unprivileged worker workflow:
- .github/workflows/time-sfmbal-benchmark-runner.yml
- Trigger: workflow_dispatch only
- Permission: read-only contents
- Runs benchmark for a specific runner/commit and caches JSON as:
  - timeSFMBAL-benchmark-v3-<os>-<arch>-<sha>
Updated orchestrator workflow:
- .github/workflows/time-sfmbal-benchmark.yml
- Triggers:
  - pull_request on opened
  - workflow_dispatch with pr_number
- Dispatches worker runs for head/base across all configured runners, waits for completion, restores cached JSONs, generates markdown, and posts/updates PR comment.
/bench trigger workflow:
- .github/workflows/time-sfmbal-benchmark-trigger.yml
- On PR issue comment /bench, dispatches orchestrator.

4) Dataset handling in worker runs

Worker now uses BAL dataset archive:
- https://grail.cs.washington.edu/projects/bal/data/dubrovnik/problem-16-22106-pre.txt.bz2
Flow:
1. Restore problem-16-22106-pre.txt.bz2 from cache
2. Download+cache on miss
3. Extract into repo data dir as:
  - examples/Data/dubrovnik-16-22106-pre.txt

Why

Limits privileged operations to orchestration/commenting; benchmark execution remains unprivileged.
Improves reproducibility with explicit per-OS/arch/SHA benchmark caches.
Supports PR-open and manual /bench flows with consistent reporting.

github-actions · 2026-02-03T18:02:05Z

timeSFMBAL benchmark

Head: 38f71129194d85b4d4c50a5109d1218948a2578b
Base: bf9913ad68056bce406daac6cdc883e3b0da83bd

No head benchmark results were found.

Worker runs

Role	Runner	SHA	Conclusion

dellaert

This looks great to me. I didn’t do a deep dive review, but I figure it’s better to get it up and running and improve over time, rather than being a stickler for details right now.

What would be nice to add is to write down in some readme somewhere (and I have a markdown file in the .github directory for this) what is going on. The (AI?) PR comment is great, but will be forgotten, so maybe just adapt to the PR comment for that readme file?

Gold856 · 2026-02-04T00:06:25Z

Pretty nifty! I'm curious though if/how this accounts for variability in GitHub Actions runners. We see that jobs can vary quite a bit in time to complete (even just the non-cached jobs), so I'm curious if we're going to end up measuring noise. Perhaps running them in the same runner instance would be better?

ProfFan · 2026-02-04T01:01:41Z

@Gold856 Runtime is mainly the building process and I believe the actual perf variance is like <5%

Also we can easily /bench again with the command.

@dellaert I'll add README in next PR

Add continuous benchmarking workflow

38f7112

ProfFan requested a review from dellaert February 3, 2026 17:48

dellaert approved these changes Feb 3, 2026

View reviewed changes

ProfFan merged commit 57bc310 into develop Feb 4, 2026
46 of 48 checks passed

ProfFan deleted the fan/benchmark_action branch February 4, 2026 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add continuous benchmarking workflow#2385

Add continuous benchmarking workflow#2385
ProfFan merged 1 commit intodevelopfrom
fan/benchmark_action

ProfFan commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

dellaert left a comment

Uh oh!

Gold856 commented Feb 4, 2026

Uh oh!

Uh oh!

ProfFan commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ProfFan commented Feb 3, 2026

Summary

timeSFMBAL benchmark

Worker runs

What Changed

1) timeSFMBAL JSON benchmark output

2) New benchmark comparison script

3) Safe split workflow design

4) Dataset handling in worker runs

Why

Uh oh!

github-actions bot commented Feb 3, 2026

timeSFMBAL benchmark

Worker runs

Uh oh!

dellaert left a comment

Choose a reason for hiding this comment

Uh oh!

Gold856 commented Feb 4, 2026

Uh oh!

Uh oh!

ProfFan commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1) `timeSFMBAL` JSON benchmark output