better bench#2394
Conversation
kevmoo
commented
May 1, 2026
- plan!
- benchmark better crazy unreviewed
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
PR Health
License Headers
|
| Files |
|---|
| pkgs/benchmark_harness/example/slow_bench.dart |
All source files should start with a license header.
Unrelated files missing license headers
| Files |
|---|
| pkgs/bazel_worker/benchmark/benchmark.dart |
| pkgs/coverage/lib/src/coverage_options.dart |
| pkgs/html/example/main.dart |
| pkgs/pubspec_parse/test/git_uri_test.dart |
| pkgs/watcher/test/custom_watcher_factory_test.dart |
This check can be disabled by tagging the PR with skip-license-check.
Unused Dependencies ✔️
| Package | Status |
|---|---|
| benchmark_harness | ✔️ All dependencies utilized correctly. |
For details on how to fix these, see dependency_validator.
This check can be disabled by tagging the PR with skip-unused-dependencies-check.
Changelog Entry ❗
| Package | Changed Files |
|---|---|
| package:benchmark_harness | pkgs/benchmark_harness/README.md pkgs/benchmark_harness/bin/bench.dart pkgs/benchmark_harness/lib/benchmark_harness.dart pkgs/benchmark_harness/lib/src/async_benchmark_base.dart pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart pkgs/benchmark_harness/lib/src/benchmark.dart pkgs/benchmark_harness/lib/src/benchmark_base.dart pkgs/benchmark_harness/lib/src/blackhole.dart pkgs/benchmark_harness/lib/src/kbssd_math.dart pkgs/benchmark_harness/lib/src/logger.dart pkgs/benchmark_harness/lib/src/logger_web.dart pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart pkgs/benchmark_harness/lib/src/model/dart_environment.dart pkgs/benchmark_harness/lib/src/report.dart pkgs/benchmark_harness/lib/src/result.dart pkgs/benchmark_harness/lib/src/runner.dart pkgs/benchmark_harness/lib/src/score_emitter.dart pkgs/benchmark_harness/pubspec.yaml |
Changes to files need to be accounted for in their respective changelogs.
This check can be disabled by tagging the PR with skip-changelog-check.
Coverage ⚠️
| File | Coverage |
|---|---|
| pkgs/benchmark_harness/bin/bench.dart | 💔 Not covered |
| pkgs/benchmark_harness/example/list_sum_variations.dart | 💔 Not covered |
| pkgs/benchmark_harness/example/modern_example.dart | 💔 Not covered |
| pkgs/benchmark_harness/example/slow_bench.dart | 💔 Not covered |
| pkgs/benchmark_harness/example/wrapper_example.dart | 💔 Not covered |
| pkgs/benchmark_harness/lib/benchmark_harness.dart | 💔 Not covered |
| pkgs/benchmark_harness/lib/src/async_benchmark_base.dart | 💔 64 % ⬇️ 27 % |
| pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart | 💚 97 % ⬆️ 2 % |
| pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart | 💔 73 % ⬇️ 20 % |
| pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart | 💚 40 % |
| pkgs/benchmark_harness/lib/src/benchmark.dart | 💚 73 % |
| pkgs/benchmark_harness/lib/src/benchmark_base.dart | 💔 76 % ⬇️ 10 % |
| pkgs/benchmark_harness/lib/src/blackhole.dart | 💚 67 % |
| pkgs/benchmark_harness/lib/src/kbssd_math.dart | 💚 100 % |
| pkgs/benchmark_harness/lib/src/logger.dart | 💚 67 % |
| pkgs/benchmark_harness/lib/src/logger_web.dart | 💔 Not covered |
| pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart | 💚 100 % |
| pkgs/benchmark_harness/lib/src/model/dart_environment.dart | 💚 50 % |
| pkgs/benchmark_harness/lib/src/report.dart | 💚 94 % |
| pkgs/benchmark_harness/lib/src/result.dart | 💚 100 % |
| pkgs/benchmark_harness/lib/src/runner.dart | 💚 83 % |
| pkgs/benchmark_harness/lib/src/score_emitter.dart | 💚 63 % ⬆️ 275 % |
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check.
Breaking changes ⚠️
| Package | Change | Current Version | New Version | Needed Version | Looking good? |
|---|---|---|---|---|---|
| benchmark_harness | Non-Breaking | 2.4.0 | 2.4.0 | 2.5.0 Got "2.4.0" expected >= "2.5.0" (non-breaking changes) |
This check can be disabled by tagging the PR with skip-breaking-check.
API leaks ✔️
The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
| Package | Leaked API symbol | Leaking sources |
|---|
This check can be disabled by tagging the PR with skip-leaking-check.
There was a problem hiding this comment.
Code Review
This pull request modernizes the benchmark_harness package by introducing an adaptive benchmarking engine, a new compositional API for defining benchmarks and variants, and enhanced reporting features such as JSON output and cross-platform comparison tables. The changes also include the integration of package:stats for more robust statistical analysis. Feedback suggests improving code maintainability by utilizing existing functionality in the stats package, ensuring consistent column ordering in the comparison tables, and adhering to the style guide by providing documentation for all public members.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request modernizes the benchmark_harness package by introducing a compositional API (Benchmark and BenchmarkVariant), a noise-resilient benchmarking engine using Kernel-Based Steady-State Detection (KBSSD), and a CLI tool for cross-platform execution (JIT, AOT, JS, WASM). It also adds a zero-cost blackhole utility to prevent dead-code elimination and provides extensive documentation and testing. Feedback includes suggestions to improve the robustness of asynchronous benchmark detection and casting, recommendations to use proper subcommand parsing for the validate command, and advice on making JSON output parsing more resilient to non-JSON diagnostic text.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request modernizes the benchmark_harness package by introducing a noise-resilient Kernel-Based Steady-State Detection (KBSSD) engine, a new compositional API (Benchmark and BenchmarkVariant), and a robust CLI tool for cross-platform benchmarking (JIT, AOT, JS, WASM). The changes include sophisticated statistical analysis using package:stats, adaptive calibration to handle environmental noise, and process-level isolation for accurate measurements. Feedback focuses on improving the robustness of the CLI tool, particularly regarding JSON output integrity, error propagation in isolate-mode, and more reliable platform detection for web targets. Additionally, some public members require documentation to comply with the repository style guide.
| '\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:', | ||
| ); | ||
| print(e); | ||
| failures[mode] = e; |
There was a problem hiding this comment.
When the --json flag is active, printing warnings or error messages directly to stdout will corrupt the resulting JSON output, making it unparseable for automated tools or CI pipelines. These diagnostic messages should be directed to stderr instead.
| '\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:', | |
| ); | |
| print(e); | |
| failures[mode] = e; | |
| stderr.writeln( | |
| '\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:', | |
| ); | |
| stderr.writeln(e); |
References
- When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.
| errorPort.listen((error) { | ||
| if (error is List && error.length >= 2) { | ||
| stderr.writeln('Isolate Error: ${error[0]}'); | ||
| stderr.writeln('${error[1]}'); | ||
| } else { | ||
| stderr.writeln('Isolate Error: $error'); | ||
| } | ||
| }); | ||
|
|
||
| await exitPort.first; | ||
| exitPort.close(); | ||
| errorPort.close(); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
The _IsolateRunner implementation does not correctly propagate execution failures. The errorPort listener merely prints to stderr and does not signal the orchestrator that the run failed. Since _runImpl returns null regardless of whether an error occurred in the isolate, the caller in compileAndRun will treat the failure as a successful run with no results, failing to populate the failures map. This makes CI stability guards ineffective for isolate-mode runs. Consider using a Completer to wait for either the exitPort or errorPort to signal completion or failure.
References
- When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.
| }) : assert(maxSamples >= 4, 'maxSamples must be at least 4'), | ||
| assert(windowSize >= 2, 'windowSize must be at least 2'), | ||
| assert(stabilityRequired > 0, 'stabilityRequired must be positive'); |
There was a problem hiding this comment.
The minimum constraints for maxSamples and windowSize are too low for the KBSSD logic to function correctly. If maxSamples is set to a small value (e.g., 4) and windowSize is 2, the cold buffer filling phase (line 157) will consume all available samples, and the convergence loop (line 181) will never execute. This results in the benchmark always being reported as unstable (isStable: false) because the converged flag remains false.
References
- When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.
| ); | ||
| } | ||
| if (elapsedMicros < 10 || elapsedMicros > 200000) { | ||
| const isWeb = bool.fromEnvironment('dart.library.js_interop'); |
There was a problem hiding this comment.
The check bool.fromEnvironment('dart.library.js_interop') is incorrect for detecting a web environment. This environment variable is not automatically defined by Dart compilers. On web targets where timer resolution is coarse, elapsedMicros can be 0, and if isWeb is incorrectly false, the runner will throw a CalibrationException. Consider using a more reliable detection method, such as checking if 0.0 is int or using conditional imports to define a platform constant in the logger files.
References
- When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.
| final isValidate = args.contains('validate'); | ||
| final cleanArgs = List<String>.from(args)..remove('validate'); | ||
| final result = _parserForBenchOptions.parse(cleanArgs); |
There was a problem hiding this comment.
Stripping the validate argument by searching and removing it from the args list is brittle. This approach can lead to incorrect behavior if validate is provided as a value for an option (e.g., --target validate.dart). It is recommended to use ArgParser.addCommand to handle subcommands properly or to restrict the check to specific positional arguments.
References
- When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.
| } | ||
|
|
||
| /// A comparison between two benchmark results. | ||
| class BenchmarkComparison { |
There was a problem hiding this comment.
The public class BenchmarkComparison is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".
References
- At least all public members should have documentation, answering the why. (link)
| } | ||
| } | ||
|
|
||
| class JsonEmitter implements DetailedScoreEmitter { |
There was a problem hiding this comment.
The public class JsonEmitter is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".
References
- At least all public members should have documentation, answering the why. (link)