You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
community(benchmarks): add component benchmark results from Intel i3-6006U Linux (#2065)
* community(benchmarks): add component benchmark results from Intel i3-6006U
Ran all four component benchmark scripts on my local Linux machine
(Intel i3-6006U, 4 cores, 3.7 GB RAM, Python 3.14.5) and recorded
the output. Added a Community-submitted benchmarks section to
docs/benchmarks/BENCHMARKS.md with actual measured numbers.
Contributes to #787.
* docs(benchmarks): address review feedback on community submission
- Add blank lines between metadata fields so they render as separate
paragraphs (Sourcery)
- Expand command column to full uv invocations, e.g.
'uv run python benchmarks/bench_orchestrator.py' (Sourcery/Copilot)
- Clarify Python 3.14.5 is a pre-release build; add note that stable
3.12/3.13 should give lower startup latency (Sourcery/CodeRabbit)
- Include full distro kernel string: 6.17.0-35-generic (CodeRabbit)
---------
Co-authored-by: Om Rohilla <om@contributor>
Copy file name to clipboardExpand all lines: docs/benchmarks/BENCHMARKS.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -129,3 +129,35 @@ uv run python benchmarks/bench_startup.py
129
129
Benchmarks measure scheduling efficiency, not code quality. A fast wrong answer is still wrong. Bernstein's janitor and quality gates ensure the output is correct before it lands - which adds overhead but saves you from debugging agent mistakes.
130
130
131
131
The real metric that matters: **how much of your day do you save?** If a single agent would take 4 hours on your backlog and Bernstein finishes it in 2.5 hours with verified output, you got back 1.5 hours. That compounds across every run.
132
+
133
+
---
134
+
135
+
## Community-submitted benchmarks
136
+
137
+
Real runs from the community. Submit yours via [issue #787](https://github.com/sipyourdrink-ltd/bernstein/issues/787) or open a PR adding a row here.
| Task store: flush latency (buffer=1) | 3.32 ms |`uv run python benchmarks/bench_task_store.py`|
157
+
| Quality gate verify_task — 1 signal | 0.038 ms |`uv run python benchmarks/bench_quality_gates.py`|
158
+
| Quality gate verify_task — 10 signals | 0.231 ms |`uv run python benchmarks/bench_quality_gates.py`|
159
+
| Quality gate verify_task — 50 signals | 1.134 ms |`uv run python benchmarks/bench_quality_gates.py`|
160
+
| Quality gate verify_task — 100 signals | 1.915 ms |`uv run python benchmarks/bench_quality_gates.py`|
161
+
| Startup latency (avg, 5 runs) | 3048.61 ms |`uv run python benchmarks/bench_startup.py`|
162
+
163
+
**Notes:** Low-end consumer laptop (budget i3, 2016 generation, 3.7 GB RAM). Startup latency is higher than expected — likely cold import overhead from running a Python 3.14 pre-release build; expect lower on stable Python 3.12/3.13. Orchestrator tick and task store throughput look normal for this hardware class. Quality gate scaling is near-linear as the docs describe.
0 commit comments