better bench by kevmoo · Pull Request #2394 · dart-lang/tools

kevmoo · 2026-05-01T18:11:16Z

plan!
benchmark better crazy unreviewed

github-actions · 2026-05-01T18:12:14Z

Package publishing

Package	Version	Status	Publish tag (post-merge)
package:bazel_worker	1.1.5	already published at pub.dev
package:benchmark_harness	2.4.0	already published at pub.dev
package:boolean_selector	2.1.2	already published at pub.dev
package:browser_launcher	1.2.0-wip	WIP (no publish necessary)
package:cli_config	0.2.1-wip	WIP (no publish necessary)
package:cli_util	0.5.1	already published at pub.dev
package:clock	1.1.3-wip	WIP (no publish necessary)
package:code_builder	4.12.0-wip	WIP (no publish necessary)
package:coverage	1.15.0	already published at pub.dev
package:csslib	1.0.2	already published at pub.dev
package:extension_discovery	2.1.0	already published at pub.dev
package:file	7.0.2-wip	WIP (no publish necessary)
package:file_testing	3.1.0-wip	WIP (no publish necessary)
package:glob	2.1.3	already published at pub.dev
package:graphs	2.4.0-wip	WIP (no publish necessary)
package:html	0.15.7-wip	WIP (no publish necessary)
package:io	1.1.0-wip	WIP (no publish necessary)
package:json_rpc_2	4.1.0	already published at pub.dev
package:markdown	7.4.0	ready to publish	`markdown-v7.4.0`
package:mime	2.1.0-wip	WIP (no publish necessary)
package:oauth2	2.0.5	already published at pub.dev
package:package_config	2.3.0-wip	WIP (no publish necessary)
package:pool	1.5.3-wip	WIP (no publish necessary)
package:process	5.0.5	(error) pubspec version (5.0.5) and changelog (5.0.6-wip) don't agree
package:pub_semver	2.2.0	already published at pub.dev
package:pubspec_parse	1.6.0-wip	WIP (no publish necessary)
package:source_map_stack_trace	2.1.3-wip	WIP (no publish necessary)
package:source_maps	0.10.14-wip	WIP (no publish necessary)
package:source_span	1.10.2	already published at pub.dev
package:sse	4.2.0	already published at pub.dev
package:stack_trace	1.12.2-wip	(error) pubspec version (1.12.2-wip) and changelog (1.12.2-dev) don't agree
package:stream_channel	2.1.4	already published at pub.dev
package:stream_transform	2.1.2-wip	WIP (no publish necessary)
package:string_scanner	1.4.2-wip	WIP (no publish necessary)
package:term_glyph	1.2.3-wip	WIP (no publish necessary)
package:test_reflective_loader	0.6.0	ready to publish	`test_reflective_loader-v0.6.0`
package:timing	1.0.2	already published at pub.dev
package:unified_analytics	8.0.15	ready to publish	`unified_analytics-v8.0.15`
package:watcher	1.2.2-wip	WIP (no publish necessary)
package:yaml	3.1.4-wip	WIP (no publish necessary)
package:yaml_edit	2.2.4	already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

github-actions · 2026-05-01T18:13:02Z

PR Health

License Headers

⚠️

// Copyright (c) 2026, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

Files
pkgs/benchmark_harness/example/slow_bench.dart

All source files should start with a license header.

Unrelated files missing license headers

Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/html/example/main.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/watcher/test/custom_watcher_factory_test.dart

This check can be disabled by tagging the PR with skip-license-check.

Unused Dependencies ✔️

Package	Status
benchmark_harness	✔️ All dependencies utilized correctly.

For details on how to fix these, see dependency_validator.

This check can be disabled by tagging the PR with skip-unused-dependencies-check.

Changelog Entry ❗

Package	Changed Files
package:benchmark_harness	pkgs/benchmark_harness/README.md pkgs/benchmark_harness/bin/bench.dart pkgs/benchmark_harness/lib/benchmark_harness.dart pkgs/benchmark_harness/lib/src/async_benchmark_base.dart pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart pkgs/benchmark_harness/lib/src/benchmark.dart pkgs/benchmark_harness/lib/src/benchmark_base.dart pkgs/benchmark_harness/lib/src/blackhole.dart pkgs/benchmark_harness/lib/src/kbssd_math.dart pkgs/benchmark_harness/lib/src/logger.dart pkgs/benchmark_harness/lib/src/logger_web.dart pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart pkgs/benchmark_harness/lib/src/model/dart_environment.dart pkgs/benchmark_harness/lib/src/report.dart pkgs/benchmark_harness/lib/src/result.dart pkgs/benchmark_harness/lib/src/runner.dart pkgs/benchmark_harness/lib/src/score_emitter.dart pkgs/benchmark_harness/pubspec.yaml

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

Coverage

⚠️

File	Coverage
pkgs/benchmark_harness/bin/bench.dart	💔 Not covered
pkgs/benchmark_harness/example/list_sum_variations.dart	💔 Not covered
pkgs/benchmark_harness/example/modern_example.dart	💔 Not covered
pkgs/benchmark_harness/example/slow_bench.dart	💔 Not covered
pkgs/benchmark_harness/example/wrapper_example.dart	💔 Not covered
pkgs/benchmark_harness/lib/benchmark_harness.dart	💔 Not covered
pkgs/benchmark_harness/lib/src/async_benchmark_base.dart	💔 64 % ⬇️ 27 %
pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart	💚 97 % ⬆️ 2 %
pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart	💔 73 % ⬇️ 20 %
pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart	💚 40 %
pkgs/benchmark_harness/lib/src/benchmark.dart	💚 73 %
pkgs/benchmark_harness/lib/src/benchmark_base.dart	💔 76 % ⬇️ 10 %
pkgs/benchmark_harness/lib/src/blackhole.dart	💚 67 %
pkgs/benchmark_harness/lib/src/kbssd_math.dart	💚 100 %
pkgs/benchmark_harness/lib/src/logger.dart	💚 67 %
pkgs/benchmark_harness/lib/src/logger_web.dart	💔 Not covered
pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart	💚 100 %
pkgs/benchmark_harness/lib/src/model/dart_environment.dart	💚 50 %
pkgs/benchmark_harness/lib/src/report.dart	💚 94 %
pkgs/benchmark_harness/lib/src/result.dart	💚 100 %
pkgs/benchmark_harness/lib/src/runner.dart	💚 83 %
pkgs/benchmark_harness/lib/src/score_emitter.dart	💚 63 % ⬆️ 275 %

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

Breaking changes

⚠️

Package	Change	Current Version	New Version	Needed Version	Looking good?
benchmark_harness	Non-Breaking	2.4.0	2.4.0	2.5.0 Got "2.4.0" expected >= "2.5.0" (non-breaking changes)	⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package	Leaked API symbol	Leaking sources

This check can be disabled by tagging the PR with skip-leaking-check.

gemini-code-assist

Code Review

This pull request modernizes the benchmark_harness package by introducing an adaptive benchmarking engine, a new compositional API for defining benchmarks and variants, and enhanced reporting features such as JSON output and cross-platform comparison tables. The changes also include the integration of package:stats for more robust statistical analysis. Feedback suggests improving code maintainability by utilizing existing functionality in the stats package, ensuring consistent column ordering in the comparison tables, and adhering to the style guide by providing documentation for all public members.

kevmoo · 2026-05-10T03:03:54Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the benchmark_harness package by introducing a compositional API (Benchmark and BenchmarkVariant), a noise-resilient benchmarking engine using Kernel-Based Steady-State Detection (KBSSD), and a CLI tool for cross-platform execution (JIT, AOT, JS, WASM). It also adds a zero-cost blackhole utility to prevent dead-code elimination and provides extensive documentation and testing. Feedback includes suggestions to improve the robustness of asynchronous benchmark detection and casting, recommendations to use proper subcommand parsing for the validate command, and advice on making JSON output parsing more resilient to non-JSON diagnostic text.

kevmoo · 2026-05-10T05:21:04Z

/gemini review

gemini-code-assist

Code Review

This pull request modernizes the benchmark_harness package by introducing a noise-resilient Kernel-Based Steady-State Detection (KBSSD) engine, a new compositional API (Benchmark and BenchmarkVariant), and a robust CLI tool for cross-platform benchmarking (JIT, AOT, JS, WASM). The changes include sophisticated statistical analysis using package:stats, adaptive calibration to handle environmental noise, and process-level isolation for accurate measurements. Feedback focuses on improving the robustness of the CLI tool, particularly regarding JSON output integrity, error propagation in isolate-mode, and more reliable platform detection for web targets. Additionally, some public members require documentation to comply with the repository style guide.

gemini-code-assist · 2026-05-10T05:25:11Z

+        '\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',
+      );
+      print(e);
+      failures[mode] = e;


When the --json flag is active, printing warnings or error messages directly to stdout will corrupt the resulting JSON output, making it unparseable for automated tools or CI pipelines. These diagnostic messages should be directed to stderr instead.

Suggested change

'\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',

);

print(e);

failures[mode] = e;

stderr.writeln(

'\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',

);

stderr.writeln(e);

References

When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

gemini-code-assist · 2026-05-10T05:25:11Z

+    errorPort.listen((error) {
+      if (error is List && error.length >= 2) {
+        stderr.writeln('Isolate Error: ${error[0]}');
+        stderr.writeln('${error[1]}');
+      } else {
+        stderr.writeln('Isolate Error: $error');
+      }
+    });
+
+    await exitPort.first;
+    exitPort.close();
+    errorPort.close();
+    return null;
+  }


The _IsolateRunner implementation does not correctly propagate execution failures. The errorPort listener merely prints to stderr and does not signal the orchestrator that the run failed. Since _runImpl returns null regardless of whether an error occurred in the isolate, the caller in compileAndRun will treat the failure as a successful run with no results, failing to populate the failures map. This makes CI stability guards ineffective for isolate-mode runs. Consider using a Completer to wait for either the exitPort or errorPort to signal completion or failure.

References

When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

gemini-code-assist · 2026-05-10T05:25:11Z

+  }) : assert(maxSamples >= 4, 'maxSamples must be at least 4'),
+       assert(windowSize >= 2, 'windowSize must be at least 2'),
+       assert(stabilityRequired > 0, 'stabilityRequired must be positive');


The minimum constraints for maxSamples and windowSize are too low for the KBSSD logic to function correctly. If maxSamples is set to a small value (e.g., 4) and windowSize is 2, the cold buffer filling phase (line 157) will consume all available samples, and the convergence loop (line 181) will never execute. This results in the benchmark always being reported as unstable (isStable: false) because the converged flag remains false.

References

When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

gemini-code-assist · 2026-05-10T05:25:11Z

+      );
+    }
+    if (elapsedMicros < 10 || elapsedMicros > 200000) {
+      const isWeb = bool.fromEnvironment('dart.library.js_interop');


The check bool.fromEnvironment('dart.library.js_interop') is incorrect for detecting a web environment. This environment variable is not automatically defined by Dart compilers. On web targets where timer resolution is coarse, elapsedMicros can be 0, and if isWeb is incorrectly false, the runner will throw a CalibrationException. Consider using a more reliable detection method, such as checking if 0.0 is int or using conditional imports to define a platform constant in the logger files.

References

When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

gemini-code-assist · 2026-05-10T05:25:11Z

+    final isValidate = args.contains('validate');
+    final cleanArgs = List<String>.from(args)..remove('validate');
+    final result = _parserForBenchOptions.parse(cleanArgs);


Stripping the validate argument by searching and removing it from the args list is brittle. This approach can lead to incorrect behavior if validate is provided as a value for an option (e.g., --target validate.dart). It is recommended to use ArgParser.addCommand to handle subcommands properly or to restrict the check to specific positional arguments.

References

When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

gemini-code-assist · 2026-05-10T05:25:11Z

+}
+
+/// A comparison between two benchmark results.
+class BenchmarkComparison {


The public class BenchmarkComparison is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".

References

At least all public members should have documentation, answering the why. ^(link)

gemini-code-assist · 2026-05-10T05:25:11Z

  }
 }
+
+class JsonEmitter implements DetailedScoreEmitter {


The public class JsonEmitter is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".

References

At least all public members should have documentation, answering the why. ^(link)

modulovalue · 2026-05-13T22:42:05Z

@kevmoo FYI https://mrale.ph/blog/2021/01/21/microbenchmarking-dart-part-1.html

github-actions Bot added the package:benchmark_harness label May 1, 2026

kevmoo mentioned this pull request May 1, 2026

[benchmark_harness] add support for detailed measurement with CV% #2393

Draft

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

kevmoo force-pushed the better_bench branch from e554ff2 to d28ccda Compare May 10, 2026 02:27

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

github-actions Bot added the type-infra A repository infrastructure change or enhancement label May 10, 2026

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

impl first attempt

bb026c6

kevmoo force-pushed the better_bench branch from 38dd347 to bb026c6 Compare May 10, 2026 17:35

kevmoo added 11 commits May 10, 2026 10:49

added math tests

58ca3d6

math optimized

7eae1a3

compile and run helper

49cf88c

docs cleanup

d92b9e8

TODO items

f07955f

todo done

3fe3d72

cleanup 1

1ea0566

collection literals are fun!

d994328

Dart environment sanity

22a7fc2

JSON output

e9a752f

coordination fun

48823b5

More notes

71288e7

Conversation

kevmoo commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Package publishing

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Health

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevmoo commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevmoo commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

modulovalue commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading