Skip to content

Conversation

@travisdowns
Copy link
Contributor

A recent "audit" of Redpanda microbenchmarks showed that many were returning essentially useless results as the overhead of reading the performance counters was much higher than the signal they are trying to measure: in several cases more than 99% of the test was spent just reading the counters.

This is a problem that can't just be fought with documentation (though it helps) - so instead estimate the overhead and warn if it exceeds 10% (default, really you should aim for < 1%).

First commits refactor a bit the odd way we were passing around the perf counters and including them in the header for no good reason.

Updates the docs.

Example output after this change, showing 99.2% overhead in this artificial test:

inja perf_tests_perf && tests/perf/perf_tests_perf -t perf_tests.test_timer_overhead
[3/3] Linking CXX executable tests/perf/perf_tests_perf
WARN  2026-01-20 00:20:46,432 seastar - Could not parse cgroups v1 file (memory/memory.limit_in_bytes).
INFO  2026-01-20 00:20:46,435 seastar - Reactor backend: io_uring
INFO  2026-01-20 00:20:46,474 [shard 20:main] io - IO queue was unable to find a suitable maximum request length, the search was cut-off early at: 16MB
INFO  2026-01-20 00:20:46,474 [shard 20:main] io - IO queue was unable to find a suitable maximum request length, the search was cut-off early at: 16MB
single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          32
random seed:              2037298104
start/stop overhead:      1.023µs (1.602µs)

test                                iters            runtime     allocs      tasks       inst     cycles   overhead
perf_tests.test_timer_overhead     486000     1.03µs ± 0.07%      0.000      0.000     317.01      351.3       99.2

Move the linux_perf_event counters (_instructions_retired_counter and
_cpu_cycles_retired_counter) from performance_test into the
time_measurement singleton object. This simplifies the interface by
making counter management an internal concern of time_measurement.

Add enable_counters() and disable_counters() methods to time_measurement
to control the hardware performance counters, and update start_run() to
no longer take counter pointers as parameters.
Move the time_measurement class from the header to perf_tests.cc,
making it an internal implementation detail. The extern global is
replaced with a file-static instance.

Add start_run() and stop_run() protected member functions to
performance_test that handle enabling/disabling counters and
delegating to the time_measurement singleton. This reduces the
public API surface in the header.

The only functions still exposed in the header are
time_measurement_start_iteration/stop_iteration, which are needed
by the public start_measuring_time/stop_measuring_time API.
These attributes don't make sense now that the methods are in the .cc
file rather than the header.
Add tracking of start_measuring_time()/stop_measuring_time() call overhead:

- Calibrate start/stop overhead at startup by timing 1000 iterations
- Track start_stop_count in time_measurement to count timing calls per run
- Calculate overhead percentage: (start_stop_count * overhead_ns) / runtime
- Add "overhead" column to text and JSON output
- Print calibrated overhead in configuration header
- Warn to stderr when overhead exceeds threshold (default 10%)
- Add --overhead-threshold option to configure warning threshold
- Add --fail-on-high-overhead option to fail test run on high overhead

Also adds a test case that deliberately has high overhead to verify the
feature works correctly.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds measurement overhead estimation to the Redpanda microbenchmark framework to identify tests where timing instrumentation consumes a significant portion of measured time, making results unreliable.

Changes:

  • Refactored time_measurement class from header to implementation file, removing unnecessary exposure of performance counters
  • Added overhead calibration at startup and tracking of start_measuring_time()/stop_measuring_time() call counts
  • Introduced overhead column in results, --overhead-threshold and --fail-on-high-overhead command-line options

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/perf/perf_tests_perf.cc Adds test case to verify overhead measurement functionality
tests/perf/perf_tests.cc Implements overhead calibration, tracking, and reporting with configurable thresholds
tests/perf/perf-tests.md Documents overhead column, warnings, and new command-line options
include/seastar/testing/perf_tests.hh Moves time_measurement class to implementation and updates API signatures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Calculate overhead percentage from start/stop timing calls
double overhead_ns = rr.start_stop_count * measure_time.start_stop_overhead_ns();
double overhead_pct = ns > 0 ? (overhead_ns / ns) * 100.0 : 0.0;
r.overhead.add(overhead_pct, 1.0); // already per-run, not per-iteration
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument 1.0 to add() is unclear without context. Consider adding a comment explaining what this weight parameter represents, or use a named constant like WEIGHT_PER_RUN to improve readability.

Suggested change
r.overhead.add(overhead_pct, 1.0); // already per-run, not per-iteration
constexpr double overhead_weight_per_run = 1.0; // already per-run, not per-iteration
r.overhead.add(overhead_pct, overhead_weight_per_run);

Copilot uses AI. Check for mistakes.
// Check if overhead exceeds threshold and warn
double median_overhead = r.overhead.stats().med;
if (median_overhead > conf.overhead_threshold) {
fmt::print("WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n",
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is printed to stdout, which may be missed in automated test runs or get lost in output. Consider using a logging framework (e.g., perf_logger.warn()) or printing to stderr to ensure visibility of this important diagnostic message.

Suggested change
fmt::print("WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n",
fmt::print(stderr,
"WARNING: test '{}' has high measurement overhead: {:.1f}% (threshold: {:.1f}%)\n",

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the test output goes to stdout, we want this to go to the same place. stderr is already heavily used by seastar loggers so it is common to run benchmarks with 2>/dev/null or similar redirection.

#### Measurement overhead
The cost of starting and stopping the timers is substantial, about 0.5μs for a start/stop pair, due to the overhead of reading the performance counters in the kernel. In the case you use manual time measurement with start/stop_measuring_time you should ensure that the timed region is substantially longer than time in order to reduce the error imposed by this overhead: for example by using a loop inside the timed region and returning the number of iterations from the test method.
The cost of starting and stopping the timers is substantial, about 1μs for a start/stop pair, due to the overhead of reading the performance counters in the kernel. When using manual time measurement with `start_measuring_time()`/`stop_measuring_time()`, you should ensure that the timed region is substantially longer than this overhead to reduce measurement error. A common approach is to use a loop inside the timed region and return the number of iterations from the test method.
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states the overhead is 'about 1μs' but the PR description mentions it was previously documented as '0.5μs'. Since the actual overhead is now calibrated at runtime and displayed in the output, consider rephrasing to indicate this is approximate and varies by system, e.g., 'typically around 1μs' or 'approximately 1μs on most systems'.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "about" not good enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant