Open
Description
Currently, we have optional WelchTTestPValueColumn
which help you to verify that there is a statistically significant difference between benchmarks. However, it doesn't work great with default run strategy because this strategy typically doesn't perform enough iterations. Users have to manually choose a satisfactory amount of iterations. Thus, it's possible to do such checks, but the user experience is not good enough. We can do the following:
- Introduce additional property in
AccuracyMode
. Let's call itStopСriterion
(let me know if you have better ideas about naming). It will contain logic which should decide when do we have enough iterations. - Currently, we have hardcoded logic inside
EngineTargetStage
. Let's move it to a class calledStdErrStopCriterion
. - We can introduce
WelchStopCriterion
which will do additional iterations until we sure that it's enough for the Welch's Two Sample t-test. (Bonus: users will be able to write own criterion) StopCriterion
should be able to affectIOrderProvider.GetExecutionOrder
and ask to run baseline benchmarks first.EngineTargetStage.RunAuto
should get additional information about benchmarks likeIsBaseline
value. The non-baseline benchmarks should get all measurements from the baseline benchmark in the corresponded group.
Original request: https://twitter.com/AnthonyLloyd123/status/1005388154046644230