Skip to content

Use hypothesis tests for testing distributions instead of matching moments of the distribution #31

@envp

Description

@envp

Pearson's chi squared test is a more reliable method of ascertaining whether a sequence of numbers belongs to a distribution or follows a patterns. It is easy to fool the test for correct mean and variance with dummy values inserted to adjust it to fit any distribution.

However we should not ignore that mean and variance must be reproduced correctly, the suggestion here is that Pearson's chi squared test be used to refactor test cases into the following structure:

  • should pass chi squared test for a specific distribution, maybe call a test helper like (E.g. for testing the uniform distribution):
    • pearson_chi_squared(candidate: Distribution::Uniform.rng(0.1, 1), target: :uniform, samples: 1000) and returns the significance level of the test as a double.
  • should return correct metadata and moments of the distribution, say a function to simulate the distribution for a specified confidence or sample size
    • metadata_for(candidate: Distribution::Normal.rng(0.1, 1), target: :normal, confidence: 0.99, samples: 100) returns {mean: 0.1, variance: 0.96, skewness: 0.15 ... }
    • Alternatively the returnee can just be an array where the entry i is moment i of the sequence

Let me know what you think about this. Right now I feel a lot of test cases are repeated. This issue would of-course require that all the methods in README.md are already implemented so as to compare stuff.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions