ci: parallelize benchmark tests across 3 runners by SebTardif · Pull Request #54 · attune-io/attune

SebTardif · 2026-05-26T21:07:27Z

Split the single Benchmark Tests job into a matrix of 3 parallel jobs, one per package (controller, metrics, recommendation).

Before: 1 runner, ~10 min (all 20 benchmarks sequential)
After: 3 runners in parallel, wall-clock = max(heaviest shard)

Shard	Package	Benchmarks	Expected weight
controller	`internal/controller/...`	9 (incl. ManyWorkloads/ManyPolicies up to 1000)	Heavy (~5-6 min)
metrics	`internal/metrics/...`	4	Light (~2 min)
recommendation	`internal/recommendation/...`	6	Medium (~3 min)

Each shard has its own baseline cache key (bench-baseline-<pkg>-...) for independent benchstat comparison. The first merge to main after this PR will establish per-package baselines.

ci-gate depends on test-bench, which with a matrix strategy already waits for all 3 shards to complete. No changes needed there.

Split the single Benchmark Tests job into a matrix of 3 parallel jobs, one per package (controller, metrics, recommendation). This reduces wall-clock time from ~10 minutes (sum of all) to ~max(controller), roughly a 2-3x speedup. Each shard maintains its own baseline cache for benchstat comparison. Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>

Further split the controller benchmark shard (8m41s) into two parallel jobs using -bench regex filtering: - controller-core: fast benchmarks (BuildPrometheusQuery, Reconcile, ComputeRecommendations) -- ~1 min expected - controller-scale: scale benchmarks (ManyWorkloads, ManyPolicies, ConcurrentPolicies up to 1000) -- the heavy tail Total shards: 4 (controller-core, controller-scale, metrics, recommendation) Expected wall-clock: max(controller-scale) instead of sum(all). Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>

Further split controller-scale (8m06s) into two parallel shards: - controller-workloads: BenchmarkReconcile_ManyWorkloads - controller-policies: BenchmarkReconcile_ManyPolicies + ConcurrentPolicies Reduce max scale from {10,100,500,1000} to {10,50,100,250}. 250 catches the same O(n^2) regressions at 1/12th the cost of 1000 (1.6s vs 20s per iteration locally). Total benchmark shards: 5 (was 4 after PR #54). Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>

SebTardif added 2 commits May 26, 2026 14:07

SebTardif merged commit 7fdd44a into main May 26, 2026
29 checks passed

SebTardif mentioned this pull request May 26, 2026

ci: split controller-scale into 2 shards and reduce max scale #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: parallelize benchmark tests across 3 runners#54

ci: parallelize benchmark tests across 3 runners#54
SebTardif merged 2 commits into
mainfrom
ci/parallel-benchmarks

SebTardif commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SebTardif commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant