Skip to content

Incorrect CIs/p values for adjusted correlations (biserial, polychoric, etc.) and partial correlations #181

Open
@bwiernik

Description

@bwiernik

It looks like the .cor_test_biserial_biserial() and similar methods (e.g., polychoric) are treating the adjusted correlations as observed Pearson correlations when computing CIs/p values.

correlation:::.cor_test_biserial_biserial
function (data, x, y, continuous, binary, ci) 
{
    var_x <- .complete_variable_x(data, continuous, binary)
    var_y <- .complete_variable_y(data, continuous, binary)
    m1 <- mean(var_x[var_y == 1])
    m0 <- mean(var_x[var_y == 0])
    q <- mean(var_y)
    p <- 1 - q
    zp <- stats::dnorm(stats::qnorm(q))
    r <- (((m1 - m0) * (p * q/zp))/sd(var_x))
    p <- cor_to_p(r, n = length(var_x))
    ci_vals <- cor_to_ci(r, n = length(var_x), ci = ci)
    data.frame(Parameter1 = x, Parameter2 = y, rho = r, t = p$statistic, 
        df_error = length(var_y) - 2, p = p$p, CI_low = ci_vals$CI_low, 
        CI_high = ci_vals$CI_high, Method = "Biserial", 
        stringsAsFactors = FALSE)
}

This isn't correct--these adjustments increase sampling error (potentially substantially if the binary variable split or polychoric level distribution are very uneven). So, currently, these CIs p values will be too small. One reasonable approximate approach to handling uncertainty in these adjusted correlations is to compute an adjusted sample size n <- ((r^2 - 1)^2 + var_e)/var_e where var_e is the estimated sampling error variance for the adjusted correlation.

Metadata

Metadata

Assignees

Labels

bug 🐛Something isn't workingenhancement 💥Implemented features can be improved or revised

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions