community(benchmarks): add component benchmark results from Intel i3-6006U Linux (#2065)

Om-Rohilla · Om Rohilla · web-flow · commit 9ee584ed7d27 · 2026-06-25T12:06:41.000+03:00
* community(benchmarks): add component benchmark results from Intel i3-6006U Ran all four component benchmark scripts on my local Linux machine (Intel i3-6006U, 4 cores, 3.7 GB RAM, Python 3.14.5) and recorded the output. Added a Community-submitted benchmarks section to docs/benchmarks/BENCHMARKS.md with actual measured numbers. Contributes to #787. * docs(benchmarks): address review feedback on community submission - Add blank lines between metadata fields so they render as separate paragraphs (Sourcery) - Expand command column to full uv invocations, e.g. 'uv run python benchmarks/bench_orchestrator.py' (Sourcery/Copilot) - Clarify Python 3.14.5 is a pre-release build; add note that stable 3.12/3.13 should give lower startup latency (Sourcery/CodeRabbit) - Include full distro kernel string: 6.17.0-35-generic (CodeRabbit) --------- Co-authored-by: Om Rohilla <om@contributor>
diff --git a/docs/benchmarks/BENCHMARKS.md b/docs/benchmarks/BENCHMARKS.md
@@ -129,3 +129,35 @@ uv run python benchmarks/bench_startup.py
 Benchmarks measure scheduling efficiency, not code quality. A fast wrong answer is still wrong. Bernstein's janitor and quality gates ensure the output is correct before it lands - which adds overhead but saves you from debugging agent mistakes.
 
 The real metric that matters: **how much of your day do you save?** If a single agent would take 4 hours on your backlog and Bernstein finishes it in 2.5 hours with verified output, you got back 1.5 hours. That compounds across every run.
+
+---
+
+## Community-submitted benchmarks
+
+Real runs from the community. Submit yours via [issue #787](https://github.com/sipyourdrink-ltd/bernstein/issues/787) or open a PR adding a row here.
+
+### Component benchmarks — Intel i3-6006U, Linux, Python 3.14 (pre-release)
+
+**Hardware:** Intel Core i3-6006U @ 2.00GHz, 4 cores, 3.7 GB RAM, Ubuntu (kernel 6.17.0-35-generic), SSD
+
+**Bernstein version:** v2.7.0
+
+**Python:** 3.14.5 (pre-release dev build, inside project venv — `uv venv`)
+
+**Submitted by:** [@Om-Rohilla](https://github.com/Om-Rohilla)
+
+| Benchmark | Result | Command |
+|-----------|--------|---------|
+| Orchestrator tick latency (100-task backlog) — avg | 5.89 ms | `uv run python benchmarks/bench_orchestrator.py` |
+| Orchestrator tick latency (100-task backlog) — max | 7.35 ms | `uv run python benchmarks/bench_orchestrator.py` |
+| Task store: creations | 251.83 tasks/sec | `uv run python benchmarks/bench_task_store.py` |
+| Task store: claims | 253.38 tasks/sec | `uv run python benchmarks/bench_task_store.py` |
+| Task store: completions | 162.19 tasks/sec | `uv run python benchmarks/bench_task_store.py` |
+| Task store: flush latency (buffer=1) | 3.32 ms | `uv run python benchmarks/bench_task_store.py` |
+| Quality gate verify_task — 1 signal | 0.038 ms | `uv run python benchmarks/bench_quality_gates.py` |
+| Quality gate verify_task — 10 signals | 0.231 ms | `uv run python benchmarks/bench_quality_gates.py` |
+| Quality gate verify_task — 50 signals | 1.134 ms | `uv run python benchmarks/bench_quality_gates.py` |
+| Quality gate verify_task — 100 signals | 1.915 ms | `uv run python benchmarks/bench_quality_gates.py` |
+| Startup latency (avg, 5 runs) | 3048.61 ms | `uv run python benchmarks/bench_startup.py` |
+
+**Notes:** Low-end consumer laptop (budget i3, 2016 generation, 3.7 GB RAM). Startup latency is higher than expected — likely cold import overhead from running a Python 3.14 pre-release build; expect lower on stable Python 3.12/3.13. Orchestrator tick and task store throughput look normal for this hardware class. Quality gate scaling is near-linear as the docs describe.