BOO benchmark results can be noisy (up to ~30us deviation observed), which makes automated regression bisection unreliable — reruns with default iterations sometimes produce times within threshold even when a real regression exists. It was reported that execution times exhibit two "modes" (one fast, one slow).
@Max191 has a branch with two improvements:
--iter-sleep flag: Adds a device sync + configurable sleep between iterations, which reduces variance from thermal/power state effects.
- Stddev/min/max stats for multi-dispatch runs: Currently these stats are only available for single-dispatch cases. The branch adds them for multi-dispatch by reporting stats from the longest-running dispatch, which gives a useful noise indicator for filtering bad runs.
Plan:
Context: https://xilinx.slack.com/archives/C08JKR35LRY/p1772034070912979
BOO benchmark results can be noisy (up to ~30us deviation observed), which makes automated regression bisection unreliable — reruns with default iterations sometimes produce times within threshold even when a real regression exists. It was reported that execution times exhibit two "modes" (one fast, one slow).
@Max191 has a branch with two improvements:
--iter-sleepflag: Adds a device sync + configurable sleep between iterations, which reduces variance from thermal/power state effects.Plan:
Context: https://xilinx.slack.com/archives/C08JKR35LRY/p1772034070912979