Erlang statistics module. The current stdlib in OTP is lacking a statistics module. I hope to add one sooner or later, in the meanwhile I will add stuff here.
In other words: the API will be subject to change until I feel it could be fixated.
I have two aspects of this:
- A somewhat complete set of functions that can be used as building blocks for functions to query measurements
- Sane return values that can be used as arguments directly with other functions
- Statistics:
- frequencies -
itemfreq/1 - histogram -
histogram/1,2,histogram_property/1 - median -
cmedian/1 - mean -
tmean/1,2,gmean/1,hmean/1 - standard error -
tsem/1,2 - attributes -
tvar/1,2,tstd/1,2,tmin/1,2,tmax/1,2 - skewness -
skewness/1,2 - kurtosis -
kurtosis/1,2
- frequencies -
- Correlations:
- Pearson’s product-moment correlation coefficient (Pearson's r) -
pearsonsr/1 - Point biserial correlation coefficient - TODO
- Spearman’s rank correlation - TODO
- Kendall rau rank correlation coefficient: tau a, tau b - TODO
- Pearson’s product-moment correlation coefficient (Pearson's r) -
- Regression:
- Simple -
linregress/1 - Multiple - TODO
- Polynomial regression - TODO
- Simple -
- Factorial Analysis:
- Extraction (PCA and Principal Axis) - TODO
- Rotation (Varimax, Equimax, Quartimax) - TODO
- Velicer’s MAP test - TODO
- Variance Analysis:
- One-way ANOVA - TODO
- Tests: F, T, Levene, U-Mannwhitney. - TODO
- Drop the leading 't' for all trimmed functions? Makes sense if they are the default ones.
- Rename limit option 'inf' to 'none'? 'inf' works great for positive limit but we don't want two atoms, i.e 'inf', '-inf'.
linregress/1could get a prettier name.itemfreq/1could get a prettier name.
The continuous stats function set keeps a state for values added which may be queried.
matstat:new() -> stats()- Create a new stats state.matstat:new([{'min' | 'max', number()} | 'gmean' | 'hmean']) -> stats()- Create a new stats state.matstat:add([number()] | number(), stats()) -> stats().
The following functions may also query the stats state.
- Compute the trimmed mean
tmean(stats() | [number()]) -> Mean :: float()tmean([number()], {'inf' | L :: number(), 'inf' | U :: number()}) -> Mean :: float()- Compute the geometric and harmonic mean
gmean(stats() | [number()]) -> GeometricMean :: float()hmean(stats() | [number()]) -> HarmonicMean :: float()- Compute the trimmed minimum
tmin(stats() | [number()]) -> Minimum :: number()tmin([number()], 'inf' | number()) -> Minimum :: number()- Compute the trimmed maximum
tmax(stats() | [number()]) -> Maximum :: number()tmax([number()], 'inf' | number()) -> Maximum :: number()
- Compute the trimmed variance
tvar(stats() | [number()]) -> Variance :: float()tvar([number()], {'inf' | number(), 'inf' | number()}) -> Variance :: float()
- Compute the trimmed sample standard deviation
tstd(stats() | [number()]) -> StdDev :: float()tstd([number()], {'inf' | number(), 'inf' | number()}) -> StdDev :: float()
- Compute the trimmed standard error of the mean
tsem(stats() | [number()]) -> StdErr :: float()tsem([number()], {'inf' | number(), 'inf' | number()}) -> StdErr :: float()
- Compute the moment
moment(Moment :: 1..4, stats()) -> Moment :: float()
- Compute the skewness
skewness(stats() | [number()]) -> Skewness :: float()skewness([number()], {'inf' | number(), 'inf' | number()}) -> Skewness :: float()
- Compute the kurtosis
kurtosis(stats() | [number()]) -> Kurtosis :: float()kurtosis([number()], {'inf' | number(), 'inf' | number()}) -> Kurtosis :: float()
cmedian([number()]) -> number()- Returns the computed median value from a list of numberslinregress([{ X :: number(), Y :: number()}) -> {{Slope :: number(), Intercept :: number()}, RSq :: float()}- Calculate a regression lineitemfreq([term()]) -> [{term(), integer()}]- Returns a 2D list of item frequencies. Highest frequency first.pearsonr([{number(),number()}) -> float()- Calculates a Pearson correlation coefficient (and the p-value for testing not yet impl.)histogram([number()], Nbins :: integer()) -> [{number(), integer()}]- Separates the range into several bins and returns the number of instances of a in each bin.histogram_new/3,histogram_add/2,histogram_counts/2,histogram_property/1
msn/1 -> {Mean :: float(), StdDev :: float()}- calculate mean and sampled standard deviation,
Function list from SciPy.stats Statistical Functions.
Will Implement in Prio order:
chisquare(f_obs[, f_exp, ddof])- Calculates a one-way chi square test.zmap(scores, compare[, axis, ddof])- Calculates the relative z-scores.zscore(a[, axis, ddof])- Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.spearmanr(a[, b, axis])- Calculates a Spearman rank-order correlation coefficient and the p-value
Still flaky about:
- mode(a[, axis]) - Returns an array of the modal (most common) value in the passed array.
- variation(a[, axis]) - Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
- describe(a[, axis]) - Computes several descriptive statistics of the passed array.
- skewtest(a[, axis]) - Tests whether the skew is different from the normal distribution.
- kurtosistest(a[, axis]) - Tests whether a dataset has normal kurtosis
- normaltest(a[, axis]) - Tests whether a sample differs from a normal distribution.
- scoreatpercentile(a, per[, limit, ...]) - Calculate the score at the given per percentile of the sequence a.
- percentileofscore(a, score[, kind]) - The percentile rank of a score relative to a list of scores.
- cumfreq(a[, numbins, defaultreallimits, weights]) - Returns a cumulative frequency histogram, using the histogram function.
- relfreq(a[, numbins, defaultreallimits, weights]) - Returns a relative frequency histogram, using the histogram function.
obrientransform(*args)- Computes a transform on input data (any number of columns).- signaltonoise(a[, axis, ddof]) - The signal-to-noise ratio of the input data.
bayes_mvs(data[, alpha])- Bayesian confidence intervals for the mean, var, and std.- threshold(a[, threshmin, threshmax, newval]) - Clip array to a given value.
- trimboth(a, proportiontocut) - Slices off a proportion of items from both ends of an array.
- trim1(a, proportiontocut[, tail]) - Slices off a proportion of items from ONE end of the passed array
f_oneway(*args)- Performs a 1-way ANOVA.- pointbiserialr(x, y) - Calculates a point biserial correlation coefficient and the associated p-value.
kendalltau(x, y[, initial_lexsort])- Calculates Kendall’s tau, a correlation measure for ordinal data.ttest_1samp(a, popmean[, axis])- Calculates the T-test for the mean of ONE group of scores a.ttest_ind(a, b[, axis, equal_var])- Calculates the T-test for the means of TWO INDEPENDENT samples of scores.ttest_rel(a, b[, axis])- Calculates the T-test on TWO RELATED samples of scores, a and b.- kstest(rvs, cdf[, args, N, alternative, mode]) - Perform the Kolmogorov-Smirnov test for goodness of fit
ks_2samp(data1, data2)- Computes the Kolmogorov-Smirnof statistic on 2 samples.mannwhitneyu(x, y[, use_continuity])- Computes the Mann-Whitney rank test on samples x and y.- tiecorrect(rankvals) - Tie correction factor for ties in the Mann-Whitney U and
- ranksums(x, y) - Compute the Wilcoxon rank-sum statistic for two samples.
- wilcoxon(x[, y]) - Calculate the Wilcoxon signed-rank test.
kruskal(*args)- Compute the Kruskal-Wallis H-test for independent samplesfriedmanchisquare(*args)- Computes the Friedman test for repeated measurements- ansari(x, y) - Perform the Ansari-Bradley test for equal scale parameters
bartlett(*args)- Perform Bartlett’s test for equal varianceslevene(*args, **kwds)- Perform Levene test for equal variances.- shapiro(x[, a, reta]) - Perform the Shapiro-Wilk test for normality.
- anderson(x[, dist]) - Anderson-Darling test for data coming from a particular distribution
binom_test(x[, n, p])- Perform a test that the probability of success is p.fligner(*args, **kwds)- Perform Fligner’s test for equal variances.- mood(x, y) - Perform Mood’s test for equal scale parameters.
oneway(*args, **kwds)- Test for equal means in two or more samples from the normal distribution.- HdrHistogram
Look at: http://www.infoq.com/presentations/latency-pitfalls