You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
auto chatterjee_correlation(ExecutionPolicy&& exec, const Container& u, const Container& v);
21
+
22
+
C++11:
23
+
template <typename Container>
24
+
auto chatterjee_correlation(const Container& u, const Container& v);
25
+
}
26
+
``
27
+
28
+
[heading Description]
29
+
30
+
The classical correlation coefficients like the Pearson's correlation are useful primarily for distinguishing when one dataset depends linearly on another.
31
+
However, Pearson's correlation coefficient has a known weakness in that when the dependent variable has an obvious functional relationship with the independent variable, the value of the correlation coefficient can take on any value.
32
+
As Chatterjee says:
33
+
34
+
> Ideally, one would like a coefficient that approaches
35
+
its maximum value if and only if one variable looks more and more like a
36
+
noiseless function of the other, just as Pearson correlation is close to its maximum value if and only if one variable is close to being a noiseless linear function of the other.
37
+
38
+
This is the problem Chatterjee's coefficient solves.
39
+
Let X and Y be random variables, where Y is not constant, and let (X_i, Y_i) be samples from this distribution.
40
+
Rearrange these samples so that X_(0) < X_{(1)} < ... X_{(n-1)} and create (R(X_{(i)}), R(Y_{(i)})).
41
+
The Chatterjee correlation is then given by
42
+
43
+
[$../equations/chatterjee_correlation.svg]
44
+
45
+
In the limit of an infinite amount of i.i.d data, the statistic lies in [0, 1].
46
+
However, if the data is not infinite, the statistic may be negative.
47
+
If X and Y are independent, the value is zero, and if Y is a measurable function of X, then the statistic is unity.
48
+
The complexity is O(n log n).
49
+
50
+
An example is given below:
51
+
52
+
std::vector<double> X{1,2,3,4,5};
53
+
std::vector<double> Y{1,2,3,4,5};
54
+
using boost::math::statistics::chatterjee_correlation;
55
+
double coeff = chatterjee_correlation(X, Y);
56
+
57
+
The implementation follows [@https://arxiv.org/pdf/1909.10140.pdf Chatterjee's paper].
58
+
59
+
/Nota bene:/ If the input is an integer type the output will be a double precision type.
60
+
61
+
[heading Invariants]
62
+
63
+
The function expects at least two samples, a non-constant vector Y, and the same number of X's as Y's.
64
+
If Y is constant, the result is a quiet NaN.
65
+
The data set must be sorted by X values.
66
+
If there are ties in the values of X, then the statistic is random due to the random breaking of ties.
67
+
Of course, random numbers are not used internally, but the result is not guaranteed to be identical on different systems.
68
+
69
+
[heading References]
70
+
71
+
* Chatterjee, Sourav. "A new coefficient of correlation." Journal of the American Statistical Association 116.536 (2021): 2009-2022.
0 commit comments