Skip to content

Commit 2223ca5

Browse files
Renamed 'r' with 'pointbiserialr' in convert_effsize (#325)
* Renamed 'r' with 'pointbiserialr' in convert_effsize * Updated changelog + unit tests * Temp fix for bug in plot_paired
1 parent cdb5831 commit 2223ca5

6 files changed

Lines changed: 105 additions & 75 deletions

File tree

docs/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ v0.6.0.dev
1414
**Bugfixes**
1515

1616
- Fixed a bug where the boolean value returned by :py:func:`pingouin.anderson` was inverted. It returned True when the data was NOT coming from the tested distribution, and vice versa. `PR 308 <https://github.com/raphaelvallat/pingouin/pull/308>`_.
17+
- Fixed misleading documentation and ``input_type`` in the :py:func:`convert_effsize` function. When converting from a Cohen's d effect size to a correlation coefficient, the resulting correlation is **not** a Pearson correlation but instead a `point-biserial correlation <https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient>`_. To avoid any confusion, ``input_type='r'`` has been deprecated and replaced with ``input_type='pointbiserialr'``. For more details, see `issue 302 <https://github.com/raphaelvallat/pingouin/issues/302>`_.
1718

1819
**New functions**
1920

notebooks/03_EffectSizes.ipynb

Lines changed: 12 additions & 7 deletions
Large diffs are not rendered by default.

pingouin/effsize.py

Lines changed: 55 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,13 @@ def compute_esci(
3232
Parameters
3333
----------
3434
stat : float
35-
Original effect size. Must be either a correlation coefficient or a
36-
Cohen-type effect size (Cohen d or Hedges g).
35+
Original effect size. Must be either a correlation coefficient or a Cohen-type effect size
36+
(Cohen d or Hedges g).
3737
nx, ny : int
3838
Length of vector x and y.
3939
paired : bool
40-
Indicates if the effect size was estimated from a paired sample.
41-
This is only relevant for cohen or hedges effect size.
40+
Indicates if the effect size was estimated from a paired sample. This is only relevant for
41+
cohen or hedges effect size.
4242
eftype : string
4343
Effect size type. Must be "r" (correlation) or "cohen" (Cohen d or Hedges g).
4444
confidence : float
@@ -57,9 +57,8 @@ def compute_esci(
5757
5858
Notes
5959
-----
60-
To compute the parametric confidence interval around a
61-
**Pearson r correlation** coefficient, one must first apply a
62-
Fisher's r-to-z transformation:
60+
To compute the parametric confidence interval around a **Pearson r correlation** coefficient,
61+
one must first apply a Fisher's r-to-z transformation:
6362
6463
.. math:: z = 0.5 \\cdot \\ln \\frac{1 + r}{1 - r} = \\text{arctanh}(r)
6564
@@ -69,14 +68,12 @@ def compute_esci(
6968
7069
where :math:`n` is the sample size.
7170
72-
The lower and upper confidence intervals - *in z-space* - are then
73-
given by:
71+
The lower and upper confidence intervals - *in z-space* - are then given by:
7472
7573
.. math:: \\text{ci}_z = z \\pm \\text{crit} \\cdot \\text{SE}
7674
77-
where :math:`\\text{crit}` is the critical value of the normal distribution
78-
corresponding to the desired confidence level (e.g. 1.96 in case of a 95%
79-
confidence interval).
75+
where :math:`\\text{crit}` is the critical value of the normal distribution corresponding to
76+
the desired confidence level (e.g. 1.96 in case of a 95% confidence interval).
8077
8178
These confidence intervals can then be easily converted back to *r-space*:
8279
@@ -85,10 +82,9 @@ def compute_esci(
8582
\\text{ci}_r = \\frac{\\exp(2 \\cdot \\text{ci}_z) - 1}
8683
{\\exp(2 \\cdot \\text{ci}_z) + 1} = \\text{tanh}(\\text{ci}_z)
8784
88-
A formula for calculating the confidence interval for a
89-
**Cohen d effect size** is given by Hedges and Olkin (1985, p86).
90-
If the effect size estimate from the sample is :math:`d`, then it follows a
91-
T distribution with standard error:
85+
A formula for calculating the confidence interval for a **Cohen d effect size** is given by
86+
Hedges and Olkin (1985, p86). If the effect size estimate from the sample is :math:`d`, then
87+
it follows a T distribution with standard error:
9288
9389
.. math::
9490
@@ -107,15 +103,14 @@ def compute_esci(
107103
108104
.. math:: \\text{ci}_d = d \\pm \\text{crit} \\cdot \\text{SE}
109105
110-
where :math:`\\text{crit}` is the critical value of the T distribution
111-
corresponding to the desired confidence level.
106+
where :math:`\\text{crit}` is the critical value of the T distribution corresponding to the
107+
desired confidence level.
112108
113109
References
114110
----------
115111
* https://en.wikipedia.org/wiki/Fisher_transformation
116112
117-
* Hedges, L., and Ingram Olkin. "Statistical models for meta-analysis."
118-
(1985).
113+
* Hedges, L., and Ingram Olkin. "Statistical models for meta-analysis." (1985).
119114
120115
* http://www.leeds.ac.uk/educol/documents/00002182.htm
121116
@@ -211,8 +206,7 @@ def compute_bootci(
211206
y : 1D-array, list, or None
212207
Second sample. Required only for bivariate functions.
213208
func : str or custom function
214-
Function to compute the bootstrapped statistic.
215-
Accepted string values are:
209+
Function to compute the bootstrapped statistic. Accepted string values are:
216210
217211
* ``'pearson'``: Pearson correlation (bivariate, paired x and y)
218212
* ``'spearman'``: Spearman correlation (bivariate, paired x and y)
@@ -501,12 +495,13 @@ def convert_effsize(ef, input_type, output_type, nx=None, ny=None):
501495
ef : float
502496
Original effect size.
503497
input_type : string
504-
Effect size type of ef. Must be ``'r'`` or ``'cohen'``.
498+
Effect size type of ef. Must be ``'cohen'`` or ``'pointbiserialr'``.
505499
output_type : string
506500
Desired effect size type. Available methods are:
507501
508502
* ``'cohen'``: Unbiased Cohen d
509503
* ``'hedges'``: Hedges g
504+
* ``'pointbiserialr'``: Point-biserial correlation
510505
* ``'eta-square'``: Eta-square
511506
* ``'odds-ratio'``: Odds ratio
512507
* ``'AUC'``: Area Under the Curve
@@ -527,15 +522,17 @@ def convert_effsize(ef, input_type, output_type, nx=None, ny=None):
527522
528523
Notes
529524
-----
530-
The formula to convert **r** to **d** is given in [1]_:
525+
The formula to convert from a`point-biserial correlation
526+
<https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient>`_ **r** to **d** is
527+
given in [1]_:
531528
532-
.. math:: d = \\frac{2r}{\\sqrt{1 - r^2}}
529+
.. math:: d = \\frac{2r_{pb}}{\\sqrt{1 - r_{pb}^2}}
533530
534-
The formula to convert **d** to **r** is given in [2]_:
531+
The formula to convert **d** to a point-biserial correlation **r** is given in [2]_:
535532
536533
.. math::
537534
538-
r = \\frac{d}{\\sqrt{d^2 + \\frac{(n_x + n_y)^2 - 2(n_x + n_y)}
535+
r_{pb} = \\frac{d}{\\sqrt{d^2 + \\frac{(n_x + n_y)^2 - 2(n_x + n_y)}
539536
{n_xn_y}}}
540537
541538
The formula to convert **d** to :math:`\\eta^2` is given in [3]_:
@@ -584,35 +581,35 @@ def convert_effsize(ef, input_type, output_type, nx=None, ny=None):
584581
>>> pg.convert_effsize(.45, 'cohen', 'hedges', nx=10, ny=10)
585582
0.4309859154929578
586583
587-
3. Convert Pearson r to Cohen d
584+
3. Convert a point-biserial correlation to Cohen d
588585
589-
>>> r = 0.40
590-
>>> d = pg.convert_effsize(r, 'r', 'cohen')
586+
>>> rpb = 0.40
587+
>>> d = pg.convert_effsize(rpb, 'pointbiserialr', 'cohen')
591588
>>> print(d)
592589
0.8728715609439696
593590
594-
4. Reverse operation: convert Cohen d to Pearson r
591+
4. Reverse operation: convert Cohen d to a point-biserial correlation
595592
596-
>>> pg.convert_effsize(d, 'cohen', 'r')
593+
>>> pg.convert_effsize(d, 'cohen', 'pointbiserialr')
597594
0.4000000000000001
598595
"""
599596
it = input_type.lower()
600597
ot = output_type.lower()
601598

602599
# Check input and output type
603-
for input in [it, ot]:
604-
if not _check_eftype(input):
605-
err = "Could not interpret input '{}'".format(input)
600+
for inp in [it, ot]:
601+
if not _check_eftype(inp):
602+
err = "Could not interpret input '{}'".format(inp)
606603
raise ValueError(err)
607-
if it not in ["r", "cohen"]:
608-
raise ValueError("Input type must be 'r' or 'cohen'")
604+
if it not in ["pointbiserialr", "cohen"]:
605+
raise ValueError("Input type must be 'cohen' or 'pointbiserialr'")
609606

610607
# Pass-through option
611608
if it == ot or ot == "none":
612609
return ef
613610

614-
# Convert r to Cohen d (Rosenthal 1994)
615-
d = (2 * ef) / np.sqrt(1 - ef**2) if it == "r" else ef
611+
# Convert point-biserial r to Cohen d (Rosenthal 1994)
612+
d = (2 * ef) / np.sqrt(1 - ef**2) if it == "pointbiserialr" else ef
616613

617614
# Then convert to the desired output type
618615
if ot == "cohen":
@@ -627,7 +624,7 @@ def convert_effsize(ef, input_type, output_type, nx=None, ny=None):
627624
"Hedges g. Returning Cohen's d instead"
628625
)
629626
return d
630-
elif ot == "r":
627+
elif ot == "pointbiserialr":
631628
# McGrath and Meyer 2006
632629
if all(v is not None for v in [nx, ny]):
633630
a = ((nx + ny) ** 2 - 2 * (nx + ny)) / (nx * ny)
@@ -640,6 +637,12 @@ def convert_effsize(ef, input_type, output_type, nx=None, ny=None):
640637
elif ot == "odds-ratio":
641638
# Borenstein et al. 2009
642639
return np.exp(d * np.pi / np.sqrt(3))
640+
elif ot == "r":
641+
# https://github.com/raphaelvallat/pingouin/issues/302
642+
raise ValueError(
643+
"Using effect size 'r' in `pingouin.convert_effsize` has been deprecated. "
644+
"Please use 'pointbiserialr' instead."
645+
)
643646
else: # ['auc']
644647
# Ruscio 2008
645648
from scipy.stats import norm
@@ -666,7 +669,8 @@ def compute_effsize(x, y, paired=False, eftype="cohen"):
666669
* ``'none'``: no effect size
667670
* ``'cohen'``: Unbiased Cohen d
668671
* ``'hedges'``: Hedges g
669-
* ``'r'``: correlation coefficient
672+
* ``'r'``: Pearson correlation coefficient
673+
* ``'pointbiserialr'``: Point-biserial correlation
670674
* ``'eta-square'``: Eta-square
671675
* ``'odds-ratio'``: Odds ratio
672676
* ``'AUC'``: Area Under the Curve
@@ -684,8 +688,8 @@ def compute_effsize(x, y, paired=False, eftype="cohen"):
684688
685689
Notes
686690
-----
687-
Missing values are automatically removed from the data. If ``x`` and ``y``
688-
are paired, the entire row is removed.
691+
Missing values are automatically removed from the data. If ``x`` and ``y`` are paired, the
692+
entire row is removed.
689693
690694
If ``x`` and ``y`` are independent, the Cohen :math:`d` is:
691695
@@ -702,22 +706,19 @@ def compute_effsize(x, y, paired=False, eftype="cohen"):
702706
d_{avg} = \\frac{\\overline{X} - \\overline{Y}}
703707
{\\sqrt{\\frac{(\\sigma_1^2 + \\sigma_2^2)}{2}}}
704708
705-
The Cohen’s d is a biased estimate of the population effect size,
706-
especially for small samples (n < 20). It is often preferable
707-
to use the corrected Hedges :math:`g` instead:
709+
The Cohen's d is a biased estimate of the population effect size, especially for small samples
710+
(n < 20). It is often preferable to use the corrected Hedges :math:`g` instead:
708711
709712
.. math:: g = d \\times (1 - \\frac{3}{4(n_1 + n_2) - 9})
710713
711-
The common language effect size is the proportion of pairs where ``x`` is
712-
higher than ``y`` (calculated with a brute-force approach where
713-
each observation of ``x`` is paired to each observation of ``y``,
714-
see :py:func:`pingouin.wilcoxon` for more details):
714+
The common language effect size is the proportion of pairs where ``x`` is higher than ``y``
715+
(calculated with a brute-force approach where each observation of ``x`` is paired to each
716+
observation of ``y``, see :py:func:`pingouin.wilcoxon` for more details):
715717
716718
.. math:: \\text{CL} = P(X > Y) + .5 \\times P(X = Y)
717719
718-
For other effect sizes, Pingouin will first calculate a Cohen :math:`d` and
719-
then use the :py:func:`pingouin.convert_effsize` to convert to the desired
720-
effect size.
720+
For other effect sizes, Pingouin will first calculate a Cohen :math:`d` and then use the
721+
:py:func:`pingouin.convert_effsize` to convert to the desired effect size.
721722
722723
References
723724
----------
@@ -822,7 +823,7 @@ def compute_effsize_from_t(tval, nx=None, ny=None, N=None, eftype="cohen"):
822823
N : int, optional
823824
Total sample size (will not be used if nx and ny are specified)
824825
eftype : string, optional
825-
desired output effect size
826+
Desired output effect size.
826827
827828
Returns
828829
-------

pingouin/tests/test_effsize.py

Lines changed: 35 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
import pandas as pd
2-
import numpy as np
31
import pytest
4-
2+
import numpy as np
3+
import pandas as pd
54
from unittest import TestCase
5+
from scipy.stats import pearsonr, pointbiserialr
6+
67
from pingouin.effsize import compute_esci, compute_effsize, compute_effsize_from_t, compute_bootci
78
from pingouin.effsize import convert_effsize as cef
89

@@ -216,29 +217,48 @@ def test_compute_boot_esci(self):
216217

217218
def test_convert_effsize(self):
218219
"""Test function convert_effsize.
220+
219221
Compare to https://www.psychometrica.de/effect_size.html
220222
"""
221223
# Cohen d
222224
d = 0.40
223225
assert cef(d, "cohen", "none") == d
224-
assert round(cef(d, "cohen", "r"), 4) == 0.1961
225-
cef(d, "cohen", "r", nx=10, ny=12) # When nx and ny are specified
226-
assert np.allclose(cef(1.002549, "cohen", "r"), 0.4481248) # R
226+
assert round(cef(d, "cohen", "pointbiserialr"), 4) == 0.1961
227+
cef(d, "cohen", "pointbiserialr", nx=10, ny=12) # When nx and ny are specified
228+
assert np.allclose(cef(1.002549, "cohen", "pointbiserialr"), 0.4481248) # R
227229
assert round(cef(d, "cohen", "eta-square"), 4) == 0.0385
228230
assert round(cef(d, "cohen", "odds-ratio"), 4) == 2.0658
229231
cef(d, "cohen", "hedges", nx=10, ny=10)
230-
cef(d, "cohen", "r")
232+
cef(d, "cohen", "pointbiserialr")
231233
cef(d, "cohen", "hedges")
232234

233-
# Correlation coefficient
234-
r = 0.65
235-
assert cef(r, "r", "none") == r
236-
assert round(cef(r, "r", "cohen"), 4) == 1.7107
237-
assert np.allclose(cef(0.4481248, "r", "cohen"), 1.002549)
238-
assert round(cef(r, "r", "eta-square"), 4) == 0.4225
239-
assert round(cef(r, "r", "odds-ratio"), 4) == 22.2606
235+
# Point-biserial correlation
236+
rpb = 0.65
237+
assert cef(rpb, "pointbiserialr", "none") == rpb
238+
assert round(cef(rpb, "pointbiserialr", "cohen"), 4) == 1.7107
239+
assert np.allclose(cef(0.4481248, "pointbiserialr", "cohen"), 1.002549)
240+
assert round(cef(rpb, "pointbiserialr", "eta-square"), 4) == 0.4225
241+
assert round(cef(rpb, "pointbiserialr", "odds-ratio"), 4) == 22.2606
242+
# Using actual values
243+
np.random.seed(42)
244+
x1, y1 = np.random.multivariate_normal(mean=[1, 2], cov=[[1, 0.5], [0.5, 1]], size=100).T
245+
xy1 = np.hstack((x1, y1))
246+
xy1_bool = np.repeat([0, 1], 100)
247+
# Let's calculate the ground-truth point-biserial correlation
248+
r_biserial = pearsonr(xy1_bool, xy1)[0] # 0.50247
249+
assert np.isclose(r_biserial, pointbiserialr(xy1_bool, xy1)[0])
250+
# Now the Cohen's d
251+
d = abs(compute_effsize(x1, y1, paired=True, eftype="cohen")) # 1.15651
252+
# And now we can convert point-biserial r <--> d
253+
r_convert = cef(abs(d), "cohen", "pointbiserialr", nx=100, ny=100) # 0.50247
254+
assert np.isclose(r_convert, r_biserial)
255+
d_convert = cef(r_biserial, "pointbiserialr", "cohen", nx=100, ny=100) # 1.162
256+
assert abs(d - d_convert) < 0.1
240257

241258
# Error
259+
with pytest.raises(ValueError):
260+
# DEPRECATED - https://github.com/raphaelvallat/pingouin/issues/302
261+
cef(d, "cohen", "r")
242262
with pytest.raises(ValueError):
243263
cef(d, "coucou", "hibou")
244264
with pytest.raises(ValueError):
@@ -252,6 +272,7 @@ def test_compute_effsize(self):
252272
compute_effsize(x=x, y=y, eftype="odds-ratio", paired=False)
253273
compute_effsize(x=x, y=y, eftype="eta-square", paired=False)
254274
compute_effsize(x=x, y=y, eftype="cles", paired=False)
275+
compute_effsize(x=x, y=y, eftype="pointbiserialr", paired=False)
255276
compute_effsize(x=x, y=y, eftype="none", paired=False)
256277
# Unequal variances
257278
z = np.random.normal(2.5, 3, 30)

pingouin/utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,7 @@ def _check_eftype(eftype):
325325
"hedges",
326326
"cohen",
327327
"r",
328+
"pointbiserialr",
328329
"eta-square",
329330
"odds-ratio",
330331
"auc",

requirements-test.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ codecov
33
pytest-cov
44
openpyxl
55
mpmath
6+
numpy<=1.23

0 commit comments

Comments
 (0)