-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Estimate timing overhead, allow failing if it is too high #3209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Move the linux_perf_event counters (_instructions_retired_counter and _cpu_cycles_retired_counter) from performance_test into the time_measurement singleton object. This simplifies the interface by making counter management an internal concern of time_measurement. Add enable_counters() and disable_counters() methods to time_measurement to control the hardware performance counters, and update start_run() to no longer take counter pointers as parameters.
Move the time_measurement class from the header to perf_tests.cc, making it an internal implementation detail. The extern global is replaced with a file-static instance. Add start_run() and stop_run() protected member functions to performance_test that handle enabling/disabling counters and delegating to the time_measurement singleton. This reduces the public API surface in the header. The only functions still exposed in the header are time_measurement_start_iteration/stop_iteration, which are needed by the public start_measuring_time/stop_measuring_time API.
These attributes don't make sense now that the methods are in the .cc file rather than the header.
Add tracking of start_measuring_time()/stop_measuring_time() call overhead: - Calibrate start/stop overhead at startup by timing 1000 iterations - Track start_stop_count in time_measurement to count timing calls per run - Calculate overhead percentage: (start_stop_count * overhead_ns) / runtime - Add "overhead" column to text and JSON output - Print calibrated overhead in configuration header - Warn to stderr when overhead exceeds threshold (default 10%) - Add --overhead-threshold option to configure warning threshold - Add --fail-on-high-overhead option to fail test run on high overhead Also adds a test case that deliberately has high overhead to verify the feature works correctly.
96b07c4 to
96ce1ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds measurement overhead estimation to the Redpanda microbenchmark framework to identify tests where timing instrumentation consumes a significant portion of measured time, making results unreliable.
Changes:
- Refactored
time_measurementclass from header to implementation file, removing unnecessary exposure of performance counters - Added overhead calibration at startup and tracking of
start_measuring_time()/stop_measuring_time()call counts - Introduced
overheadcolumn in results,--overhead-thresholdand--fail-on-high-overheadcommand-line options
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/perf/perf_tests_perf.cc | Adds test case to verify overhead measurement functionality |
| tests/perf/perf_tests.cc | Implements overhead calibration, tracking, and reporting with configurable thresholds |
| tests/perf/perf-tests.md | Documents overhead column, warnings, and new command-line options |
| include/seastar/testing/perf_tests.hh | Moves time_measurement class to implementation and updates API signatures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Calculate overhead percentage from start/stop timing calls | ||
| double overhead_ns = rr.start_stop_count * measure_time.start_stop_overhead_ns(); | ||
| double overhead_pct = ns > 0 ? (overhead_ns / ns) * 100.0 : 0.0; | ||
| r.overhead.add(overhead_pct, 1.0); // already per-run, not per-iteration |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second argument 1.0 to add() is unclear without context. Consider adding a comment explaining what this weight parameter represents, or use a named constant like WEIGHT_PER_RUN to improve readability.
| r.overhead.add(overhead_pct, 1.0); // already per-run, not per-iteration | |
| constexpr double overhead_weight_per_run = 1.0; // already per-run, not per-iteration | |
| r.overhead.add(overhead_pct, overhead_weight_per_run); |
| // Check if overhead exceeds threshold and warn | ||
| double median_overhead = r.overhead.stats().med; | ||
| if (median_overhead > conf.overhead_threshold) { | ||
| fmt::print("WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n", |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning is printed to stdout, which may be missed in automated test runs or get lost in output. Consider using a logging framework (e.g., perf_logger.warn()) or printing to stderr to ensure visibility of this important diagnostic message.
| fmt::print("WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n", | |
| fmt::print(stderr, | |
| "WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the test output goes to stdout, we want this to go to the same place. stderr is already heavily used by seastar loggers so it is common to run benchmarks with 2>/dev/null or similar redirection.
| #### Measurement overhead | ||
| The cost of starting and stopping the timers is substantial, about 0.5μs for a start/stop pair, due to the overhead of reading the performance counters in the kernel. In the case you use manual time measurement with start/stop_measuring_time you should ensure that the timed region is substantially longer than time in order to reduce the error imposed by this overhead: for example by using a loop inside the timed region and returning the number of iterations from the test method. | ||
| The cost of starting and stopping the timers is substantial, about 1μs for a start/stop pair, due to the overhead of reading the performance counters in the kernel. When using manual time measurement with `start_measuring_time()`/`stop_measuring_time()`, you should ensure that the timed region is substantially longer than this overhead to reduce measurement error. A common approach is to use a loop inside the timed region and return the number of iterations from the test method. |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states the overhead is 'about 1μs' but the PR description mentions it was previously documented as '0.5μs'. Since the actual overhead is now calibrated at runtime and displayed in the output, consider rephrasing to indicate this is approximate and varies by system, e.g., 'typically around 1μs' or 'approximately 1μs on most systems'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "about" not good enough?
A recent "audit" of Redpanda microbenchmarks showed that many were returning essentially useless results as the overhead of reading the performance counters was much higher than the signal they are trying to measure: in several cases more than 99% of the test was spent just reading the counters.
This is a problem that can't just be fought with documentation (though it helps) - so instead estimate the overhead and warn if it exceeds 10% (default, really you should aim for < 1%).
First commits refactor a bit the odd way we were passing around the perf counters and including them in the header for no good reason.
Updates the docs.
Example output after this change, showing 99.2% overhead in this artificial test: