Skip to content

An easy way to check for statistically significant difference between benchmarks #786

Open
@AndreyAkinshin

Description

@AndreyAkinshin

Currently, we have optional WelchTTestPValueColumn which help you to verify that there is a statistically significant difference between benchmarks. However, it doesn't work great with default run strategy because this strategy typically doesn't perform enough iterations. Users have to manually choose a satisfactory amount of iterations. Thus, it's possible to do such checks, but the user experience is not good enough. We can do the following:

  • Introduce additional property in AccuracyMode. Let's call it StopСriterion (let me know if you have better ideas about naming). It will contain logic which should decide when do we have enough iterations.
  • Currently, we have hardcoded logic inside EngineTargetStage. Let's move it to a class called StdErrStopCriterion.
  • We can introduce WelchStopCriterion which will do additional iterations until we sure that it's enough for the Welch's Two Sample t-test. (Bonus: users will be able to write own criterion)
  • StopCriterion should be able to affect IOrderProvider.GetExecutionOrder and ask to run baseline benchmarks first.
  • EngineTargetStage.RunAuto should get additional information about benchmarks like IsBaseline value. The non-baseline benchmarks should get all measurements from the baseline benchmark in the corresponded group.

Original request: https://twitter.com/AnthonyLloyd123/status/1005388154046644230

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions