Skip to content

better bench#2394

Draft
kevmoo wants to merge 13 commits into
mainfrom
better_bench
Draft

better bench#2394
kevmoo wants to merge 13 commits into
mainfrom
better_bench

Conversation

@kevmoo
Copy link
Copy Markdown
Member

@kevmoo kevmoo commented May 1, 2026

  • plan!
  • benchmark better crazy unreviewed

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.5 already published at pub.dev
package:benchmark_harness 2.4.0 already published at pub.dev
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.2.0-wip WIP (no publish necessary)
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.5.1 already published at pub.dev
package:clock 1.1.3-wip WIP (no publish necessary)
package:code_builder 4.12.0-wip WIP (no publish necessary)
package:coverage 1.15.0 already published at pub.dev
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.4.0-wip WIP (no publish necessary)
package:html 0.15.7-wip WIP (no publish necessary)
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 4.1.0 already published at pub.dev
package:markdown 7.4.0 ready to publish markdown-v7.4.0
package:mime 2.1.0-wip WIP (no publish necessary)
package:oauth2 2.0.5 already published at pub.dev
package:package_config 2.3.0-wip WIP (no publish necessary)
package:pool 1.5.3-wip WIP (no publish necessary)
package:process 5.0.5 (error) pubspec version (5.0.5) and changelog (5.0.6-wip) don't agree
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.6.0-wip WIP (no publish necessary)
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.2 already published at pub.dev
package:sse 4.2.0 already published at pub.dev
package:stack_trace 1.12.2-wip (error) pubspec version (1.12.2-wip) and changelog (1.12.2-dev) don't agree
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.2-wip WIP (no publish necessary)
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.6.0 ready to publish test_reflective_loader-v0.6.0
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 8.0.15 ready to publish unified_analytics-v8.0.15
package:watcher 1.2.2-wip WIP (no publish necessary)
package:yaml 3.1.4-wip WIP (no publish necessary)
package:yaml_edit 2.2.4 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

PR Health

License Headers ⚠️
// Copyright (c) 2026, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

Files
pkgs/benchmark_harness/example/slow_bench.dart

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/html/example/main.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/watcher/test/custom_watcher_factory_test.dart

This check can be disabled by tagging the PR with skip-license-check.

Unused Dependencies ✔️
Package Status
benchmark_harness ✔️ All dependencies utilized correctly.

For details on how to fix these, see dependency_validator.

This check can be disabled by tagging the PR with skip-unused-dependencies-check.

Changelog Entry
Package Changed Files
package:benchmark_harness pkgs/benchmark_harness/README.md
pkgs/benchmark_harness/bin/bench.dart
pkgs/benchmark_harness/lib/benchmark_harness.dart
pkgs/benchmark_harness/lib/src/async_benchmark_base.dart
pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart
pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart
pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart
pkgs/benchmark_harness/lib/src/benchmark.dart
pkgs/benchmark_harness/lib/src/benchmark_base.dart
pkgs/benchmark_harness/lib/src/blackhole.dart
pkgs/benchmark_harness/lib/src/kbssd_math.dart
pkgs/benchmark_harness/lib/src/logger.dart
pkgs/benchmark_harness/lib/src/logger_web.dart
pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart
pkgs/benchmark_harness/lib/src/model/dart_environment.dart
pkgs/benchmark_harness/lib/src/report.dart
pkgs/benchmark_harness/lib/src/result.dart
pkgs/benchmark_harness/lib/src/runner.dart
pkgs/benchmark_harness/lib/src/score_emitter.dart
pkgs/benchmark_harness/pubspec.yaml

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

Coverage ⚠️
File Coverage
pkgs/benchmark_harness/bin/bench.dart 💔 Not covered
pkgs/benchmark_harness/example/list_sum_variations.dart 💔 Not covered
pkgs/benchmark_harness/example/modern_example.dart 💔 Not covered
pkgs/benchmark_harness/example/slow_bench.dart 💔 Not covered
pkgs/benchmark_harness/example/wrapper_example.dart 💔 Not covered
pkgs/benchmark_harness/lib/benchmark_harness.dart 💔 Not covered
pkgs/benchmark_harness/lib/src/async_benchmark_base.dart 💔 64 % ⬇️ 27 %
pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart 💚 97 % ⬆️ 2 %
pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart 💔 73 % ⬇️ 20 %
pkgs/benchmark_harness/lib/src/bench_command/wrapper_helper.dart 💚 40 %
pkgs/benchmark_harness/lib/src/benchmark.dart 💚 73 %
pkgs/benchmark_harness/lib/src/benchmark_base.dart 💔 76 % ⬇️ 10 %
pkgs/benchmark_harness/lib/src/blackhole.dart 💚 67 %
pkgs/benchmark_harness/lib/src/kbssd_math.dart 💚 100 %
pkgs/benchmark_harness/lib/src/logger.dart 💚 67 %
pkgs/benchmark_harness/lib/src/logger_web.dart 💔 Not covered
pkgs/benchmark_harness/lib/src/model/benchmark_result_models.dart 💚 100 %
pkgs/benchmark_harness/lib/src/model/dart_environment.dart 💚 50 %
pkgs/benchmark_harness/lib/src/report.dart 💚 94 %
pkgs/benchmark_harness/lib/src/result.dart 💚 100 %
pkgs/benchmark_harness/lib/src/runner.dart 💚 83 %
pkgs/benchmark_harness/lib/src/score_emitter.dart 💚 63 % ⬆️ 275 %

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

Breaking changes ⚠️
Package Change Current Version New Version Needed Version Looking good?
benchmark_harness Non-Breaking 2.4.0 2.4.0 2.5.0
Got "2.4.0" expected >= "2.5.0" (non-breaking changes)
⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbol Leaking sources

This check can be disabled by tagging the PR with skip-leaking-check.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the benchmark_harness package by introducing an adaptive benchmarking engine, a new compositional API for defining benchmarks and variants, and enhanced reporting features such as JSON output and cross-platform comparison tables. The changes also include the integration of package:stats for more robust statistical analysis. Feedback suggests improving code maintainability by utilizing existing functionality in the stats package, ensuring consistent column ordering in the comparison tables, and adhering to the style guide by providing documentation for all public members.

Comment thread pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart Outdated
Comment thread pkgs/benchmark_harness/lib/src/benchmark.dart Outdated
Comment thread pkgs/benchmark_harness/lib/src/benchmark.dart
Comment thread pkgs/benchmark_harness/lib/src/result.dart
Comment thread pkgs/benchmark_harness/lib/src/runner.dart Outdated
Comment thread pkgs/benchmark_harness/lib/src/runner.dart
Comment thread pkgs/benchmark_harness/lib/src/runner.dart Outdated
@kevmoo
Copy link
Copy Markdown
Member Author

kevmoo commented May 10, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the benchmark_harness package by introducing a compositional API (Benchmark and BenchmarkVariant), a noise-resilient benchmarking engine using Kernel-Based Steady-State Detection (KBSSD), and a CLI tool for cross-platform execution (JIT, AOT, JS, WASM). It also adds a zero-cost blackhole utility to prevent dead-code elimination and provides extensive documentation and testing. Feedback includes suggestions to improve the robustness of asynchronous benchmark detection and casting, recommendations to use proper subcommand parsing for the validate command, and advice on making JSON output parsing more resilient to non-JSON diagnostic text.

Comment thread pkgs/benchmark_harness/lib/src/benchmark.dart Outdated
Comment thread pkgs/benchmark_harness/lib/src/benchmark.dart
Comment thread pkgs/benchmark_harness/lib/src/bench_command/bench_options.dart
Comment thread pkgs/benchmark_harness/lib/src/runner.dart Outdated
Comment thread pkgs/benchmark_harness/lib/src/bench_command/compile_and_run.dart Outdated
@github-actions github-actions Bot added the type-infra A repository infrastructure change or enhancement label May 10, 2026
@kevmoo
Copy link
Copy Markdown
Member Author

kevmoo commented May 10, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modernizes the benchmark_harness package by introducing a noise-resilient Kernel-Based Steady-State Detection (KBSSD) engine, a new compositional API (Benchmark and BenchmarkVariant), and a robust CLI tool for cross-platform benchmarking (JIT, AOT, JS, WASM). The changes include sophisticated statistical analysis using package:stats, adaptive calibration to handle environmental noise, and process-level isolation for accurate measurements. Feedback focuses on improving the robustness of the CLI tool, particularly regarding JSON output integrity, error propagation in isolate-mode, and more reliable platform detection for web targets. Additionally, some public members require documentation to comply with the repository style guide.

Comment on lines +37 to +40
'\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',
);
print(e);
failures[mode] = e;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When the --json flag is active, printing warnings or error messages directly to stdout will corrupt the resulting JSON output, making it unparseable for automated tools or CI pipelines. These diagnostic messages should be directed to stderr instead.

Suggested change
'\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',
);
print(e);
failures[mode] = e;
stderr.writeln(
'\nWarning: Failed to run benchmark for ${mode.name.toUpperCase()}:',
);
stderr.writeln(e);
References
  1. When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

Comment on lines +475 to +488
errorPort.listen((error) {
if (error is List && error.length >= 2) {
stderr.writeln('Isolate Error: ${error[0]}');
stderr.writeln('${error[1]}');
} else {
stderr.writeln('Isolate Error: $error');
}
});

await exitPort.first;
exitPort.close();
errorPort.close();
return null;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _IsolateRunner implementation does not correctly propagate execution failures. The errorPort listener merely prints to stderr and does not signal the orchestrator that the run failed. Since _runImpl returns null regardless of whether an error occurred in the isolate, the caller in compileAndRun will treat the failure as a successful run with no results, failing to populate the failures map. This makes CI stability guards ineffective for isolate-mode runs. Consider using a Completer to wait for either the exitPort or errorPort to signal completion or failure.

References
  1. When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

Comment on lines +66 to +68
}) : assert(maxSamples >= 4, 'maxSamples must be at least 4'),
assert(windowSize >= 2, 'windowSize must be at least 2'),
assert(stabilityRequired > 0, 'stabilityRequired must be positive');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The minimum constraints for maxSamples and windowSize are too low for the KBSSD logic to function correctly. If maxSamples is set to a small value (e.g., 4) and windowSize is 2, the cold buffer filling phase (line 157) will consume all available samples, and the convergence loop (line 181) will never execute. This results in the benchmark always being reported as unstable (isStable: false) because the converged flag remains false.

References
  1. When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

);
}
if (elapsedMicros < 10 || elapsedMicros > 200000) {
const isWeb = bool.fromEnvironment('dart.library.js_interop');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check bool.fromEnvironment('dart.library.js_interop') is incorrect for detecting a web environment. This environment variable is not automatically defined by Dart compilers. On web targets where timer resolution is coarse, elapsedMicros can be 0, and if isWeb is incorrectly false, the runner will throw a CalibrationException. Consider using a more reliable detection method, such as checking if 0.0 is int or using conditional imports to define a platform constant in the logger files.

References
  1. When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

Comment on lines +44 to +46
final isValidate = args.contains('validate');
final cleanArgs = List<String>.from(args)..remove('validate');
final result = _parserForBenchOptions.parse(cleanArgs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Stripping the validate argument by searching and removing it from the args list is brittle. This approach can lead to incorrect behavior if validate is provided as a value for an option (e.g., --target validate.dart). It is recommended to use ArgParser.addCommand to handle subcommands properly or to restrict the check to specific positional arguments.

References
  1. When building foundational components, prioritize robust, future-proof implementations over simpler ones that may introduce subtle bugs or rely on unintended behavior.

}

/// A comparison between two benchmark results.
class BenchmarkComparison {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The public class BenchmarkComparison is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".

References
  1. At least all public members should have documentation, answering the why. (link)

}
}

class JsonEmitter implements DetailedScoreEmitter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The public class JsonEmitter is missing documentation. According to the repository style guide, all public members should have documentation that explains the "why".

References
  1. At least all public members should have documentation, answering the why. (link)

@modulovalue
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

package:benchmark_harness type-infra A repository infrastructure change or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants