Skip to content

Improve BOO benchmark stability for regression detection #2842

@rkayaith

Description

@rkayaith

BOO benchmark results can be noisy (up to ~30us deviation observed), which makes automated regression bisection unreliable — reruns with default iterations sometimes produce times within threshold even when a real regression exists. It was reported that execution times exhibit two "modes" (one fast, one slow).

@Max191 has a branch with two improvements:

  1. --iter-sleep flag: Adds a device sync + configurable sleep between iterations, which reduces variance from thermal/power state effects.
  2. Stddev/min/max stats for multi-dispatch runs: Currently these stats are only available for single-dispatch cases. The branch adds them for multi-dispatch by reporting stats from the longest-running dispatch, which gives a useful noise indicator for filtering bad runs.

Plan:

  • Investigate the noise — reproduce the bimodal execution time distribution and characterize the two modes
  • Test whether the proposed changes address the issue
  • Land the changes (or an equivalent fix) based on findings

Context: https://xilinx.slack.com/archives/C08JKR35LRY/p1772034070912979

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions