Skip to content

Commit 26f4bfc

Browse files
committed
Add benchmark suite with GitHub Actions workflow
1 parent d18c51f commit 26f4bfc

File tree

15 files changed

+868
-2
lines changed

15 files changed

+868
-2
lines changed

.github/workflows/benchmarks.yml

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: benchmarks
2+
3+
on:
4+
workflow_dispatch:
5+
schedule:
6+
- cron: "0 8 * * 1"
7+
8+
permissions:
9+
contents: read
10+
11+
concurrency:
12+
group: ${{ github.workflow }}-${{ github.ref }}
13+
cancel-in-progress: true
14+
15+
jobs:
16+
benchmark:
17+
runs-on: ubuntu-latest
18+
timeout-minutes: 20
19+
20+
steps:
21+
- name: Checkout code
22+
uses: actions/checkout@v4
23+
24+
- name: Setup Bun
25+
uses: oven-sh/setup-bun@v2
26+
with:
27+
bun-version: 1.3.1
28+
29+
- name: Cache dependencies
30+
uses: actions/cache@v4
31+
with:
32+
path: |
33+
~/.bun/install/cache
34+
node_modules
35+
key: ${{ runner.os }}-bun-bench-${{ hashFiles('bun.lock') }}
36+
restore-keys: |
37+
${{ runner.os }}-bun-bench-
38+
39+
- name: Install dependencies
40+
run: bun install --frozen-lockfile
41+
42+
- name: Run benchmarks
43+
run: bun run bench:ci
44+
45+
- name: Upload benchmark artifacts
46+
uses: actions/upload-artifact@v4
47+
with:
48+
name: benchmark-results
49+
path: bench-results
50+
51+
- name: Publish benchmark summary
52+
run: cat bench-results/summary.md >> "$GITHUB_STEP_SUMMARY"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ node_modules
44
# output
55
out
66
dist
7+
bench-results
78
*.tgz
89

910
# code coverage

bench/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Benchmarks
2+
3+
The benchmark suite tracks relative performance trends for `tryharder` without turning noisy microbenchmarks into a required PR gate.
4+
5+
## Commands
6+
7+
```bash
8+
bun run bench
9+
bun run bench:ci
10+
```
11+
12+
`bun run bench` builds the package and prints human-readable `mitata` output.
13+
14+
`bun run bench:ci` builds the package, emits structured benchmark JSON, and writes these artifacts:
15+
16+
- `bench-results/latest.json`
17+
- `bench-results/summary.md`
18+
19+
## Benchmark discipline
20+
21+
- Benchmarks import from `dist`, not `src`, so measurements match the published package surface.
22+
- Cases stay deterministic and timer-free. Do not use `sleep`, real timeout expiry, or cancellation races in this suite.
23+
- Reuse task graphs and builder fixtures where setup can stay outside the measured loop.
24+
- Route results through the shared sink in `bench/shared.ts` so the runtime cannot optimize work away.
25+
- Run on the same Bun version when comparing history. This repo pins benchmark runs to `bun@1.3.1`.
26+
- Treat results as trend signals. Do not infer product-level latency from these microbenchmarks.
27+
28+
## How to read these benchmarks
29+
30+
- Treat the suite as an overhead tracker for `tryharder`, not as an end-to-end application latency test.
31+
- Compare like-for-like cases over time. The most useful regressions are usually `run/function/sync-success`, `runSync/function/success`, `run/object/mapped-error`, `all/two-independent-sync-tasks`, `allSettled/two-successful-tasks`, and `flow/immediate-exit`.
32+
- Use the direct baselines to understand scale, but do not over-index on huge ratios against `baseline/direct-sync-call`. That case is so small that tiny absolute changes can create very large relative multipliers.
33+
- Prefer absolute changes in `ns/iter` or `us/iter` when reading results. A `+200 ns` regression on a hot-path benchmark is usually more meaningful than a percentage quoted without context.
34+
- Read policy benchmarks as incremental overhead on top of execution. `wrap/runSync/success`, `signal/runSync/success`, `timeout/run/success-no-expiry`, and `retry/runSync/succeeds-on-third-attempt` show the cost of enabling those features even when they do not fail. The async control cases `signal/run/async-success-no-abort`, `timeout/run/async-success-no-expiry`, and `signal-timeout/run/async-success-no-abort-no-expiry` are the cases to watch when evaluating `resolveWithAbort()` changes.
35+
- Read orchestration benchmarks as framework cost for very small graphs. `all`, `allSettled`, and `flow` are expected to be much slower than `Promise.all` in these tiny cases because they are doing dependency tracking, cancellation wiring, and result shaping. Compare unused-feature cases with exercised-feature cases like `all/two-independent-sync-tasks-with-signal`, `all/two-independent-sync-tasks-with-disposer`, `allSettled/two-successful-tasks-with-disposer`, and `flow/two-node-dependency-then-exit` to see where fixed setup cost is going.
36+
- Only compare history across runs that use the same Bun version, machine class, and benchmark suite version. Cross-machine numbers are not reliable enough for regression calls.

bench/__tests__/report.test.ts

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
import { describe, expect, it } from "bun:test"
2+
import {
3+
normalizeBenchmarkPayload,
4+
parseRawBenchmarkPayload,
5+
renderBenchmarkSummary,
6+
} from "../report"
7+
8+
describe("benchmark reporting", () => {
9+
it("normalizes benchmark payloads into artifact output", () => {
10+
const payload = parseRawBenchmarkPayload(
11+
JSON.stringify({
12+
groups: {
13+
"all/two-independent-sync-tasks": "orchestration",
14+
"runSync/function/success": "core",
15+
},
16+
results: {
17+
benchmarks: [
18+
{
19+
runs: [
20+
{
21+
name: "runSync/function/success",
22+
stats: {
23+
avg: 125,
24+
samples: [100, 125, 150],
25+
},
26+
},
27+
],
28+
},
29+
{
30+
runs: [
31+
{
32+
name: "all/two-independent-sync-tasks",
33+
stats: {
34+
avg: 250,
35+
samples: [200, 250],
36+
},
37+
},
38+
],
39+
},
40+
],
41+
context: {
42+
cpu: {
43+
name: "Test CPU",
44+
},
45+
version: "1.3.1",
46+
},
47+
},
48+
suiteVersion: 1,
49+
})
50+
)
51+
52+
const artifact = normalizeBenchmarkPayload(payload, {
53+
arch: "arm64",
54+
date: "2026-03-07T00:00:00.000Z",
55+
gitSha: "abc123",
56+
platform: "darwin",
57+
})
58+
59+
expect(artifact).toEqual({
60+
cases: [
61+
{
62+
avgNs: 125,
63+
group: "core",
64+
hz: 8_000_000,
65+
name: "runSync/function/success",
66+
samples: 3,
67+
},
68+
{
69+
avgNs: 250,
70+
group: "orchestration",
71+
hz: 4_000_000,
72+
name: "all/two-independent-sync-tasks",
73+
samples: 2,
74+
},
75+
],
76+
meta: {
77+
arch: "arm64",
78+
bunVersion: "1.3.1",
79+
cpuModel: "Test CPU",
80+
date: "2026-03-07T00:00:00.000Z",
81+
gitSha: "abc123",
82+
platform: "darwin",
83+
suiteVersion: 1,
84+
},
85+
})
86+
})
87+
88+
it("throws when benchmark fields are missing", () => {
89+
expect(() =>
90+
normalizeBenchmarkPayload({
91+
groups: {},
92+
results: {
93+
benchmarks: [
94+
{
95+
runs: [
96+
{
97+
stats: {
98+
avg: 125,
99+
samples: [100],
100+
},
101+
},
102+
],
103+
},
104+
],
105+
},
106+
})
107+
).toThrow("Benchmark run is missing name")
108+
})
109+
110+
it("supports empty benchmark results", () => {
111+
const artifact = normalizeBenchmarkPayload(
112+
{
113+
groups: {},
114+
results: {
115+
benchmarks: [],
116+
context: {},
117+
},
118+
},
119+
{
120+
arch: "arm64",
121+
bunVersion: "1.3.1",
122+
date: "2026-03-07T00:00:00.000Z",
123+
gitSha: "abc123",
124+
platform: "darwin",
125+
}
126+
)
127+
128+
expect(artifact.cases).toEqual([])
129+
expect(renderBenchmarkSummary(artifact)).toContain("No benchmark cases were produced.")
130+
})
131+
132+
it("throws on non-finite numeric values", () => {
133+
expect(() =>
134+
normalizeBenchmarkPayload({
135+
groups: {
136+
"runSync/function/success": "core",
137+
},
138+
results: {
139+
benchmarks: [
140+
{
141+
runs: [
142+
{
143+
name: "runSync/function/success",
144+
stats: {
145+
avg: Number.POSITIVE_INFINITY,
146+
samples: [100],
147+
},
148+
},
149+
],
150+
},
151+
],
152+
},
153+
})
154+
).toThrow("Benchmark runSync/function/success is missing stats.avg")
155+
})
156+
})

bench/constants.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
export const BENCHMARK_SUITE_VERSION = 2

bench/index.ts

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import { run } from "mitata"
2+
import { BENCHMARK_SUITE_VERSION } from "./constants"
3+
import { getBenchmarkGroups } from "./shared"
4+
import { registerCoreBenchmarks } from "./suites/core.bench"
5+
import { registerOrchestrationBenchmarks } from "./suites/orchestration.bench"
6+
import { registerPoliciesBenchmarks } from "./suites/policies.bench"
7+
8+
registerCoreBenchmarks()
9+
registerPoliciesBenchmarks()
10+
registerOrchestrationBenchmarks()
11+
12+
const suppressPrint = () => null
13+
const isJson = process.argv.includes("--json")
14+
const results = await run({
15+
colors: !isJson,
16+
format: isJson ? "quiet" : "mitata",
17+
print: isJson ? suppressPrint : undefined,
18+
throw: true,
19+
})
20+
21+
if (isJson) {
22+
process.stdout.write(
23+
`${JSON.stringify({
24+
groups: getBenchmarkGroups(),
25+
results: {
26+
benchmarks: results.benchmarks.map((trial) => ({
27+
runs: trial.runs.map((run) => ({
28+
error: run.error,
29+
name: run.name,
30+
stats:
31+
run.stats === undefined
32+
? undefined
33+
: {
34+
avg: run.stats.avg,
35+
samples: run.stats.samples.length,
36+
},
37+
})),
38+
})),
39+
context: {
40+
cpu: {
41+
name: results.context.cpu.name,
42+
},
43+
version: Bun.version,
44+
},
45+
},
46+
suiteVersion: BENCHMARK_SUITE_VERSION,
47+
})}\n`
48+
)
49+
}

0 commit comments

Comments
 (0)