Stabilizing benchmark results

At the moment, `gemm_bench.py` passes `--benchmark_repetitions=3` to stabilize benchmark results. Then the `bench_summary_process` function defined in `bench_utils.py` is returning the mean time (according to local variable names; I'm trusting that it's correct).

In my experience, when benchmarking on AMD GPU, the single most effective step to take to stabilize results is something like `--benchmark_min_warmup_time=0.1` to run warm-up iterations that are discarded. Just 0.1 second seems to be more than enough.

If, after doing that, we still want to run multiple iterations to further reduce noise, then I would suggest taking either the min or the median of the iteration latencies, not the mean. The problem with the mean as an estimator is that if there is noise in the input N values, then there is still 1/N of that noise in the mean. The noise doesn't decrease fast was N increases, it only decreases in 1/N.  By contrast, if we run 3 repetition, then the median will be unaffected by one bad repetition, and the min will be unaffected by two bad repetitions.

Ideally, these mechanisms should be shared around `bench_utils.py`, not local to each benchmark such as `gemm_bench.py`.

WDYT @saienduri, @kuhar ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilizing benchmark results #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stabilizing benchmark results #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions