Open
Description
It looks like the .cor_test_biserial_biserial()
and similar methods (e.g., polychoric) are treating the adjusted correlations as observed Pearson correlations when computing CIs/p values.
correlation:::.cor_test_biserial_biserial
function (data, x, y, continuous, binary, ci)
{
var_x <- .complete_variable_x(data, continuous, binary)
var_y <- .complete_variable_y(data, continuous, binary)
m1 <- mean(var_x[var_y == 1])
m0 <- mean(var_x[var_y == 0])
q <- mean(var_y)
p <- 1 - q
zp <- stats::dnorm(stats::qnorm(q))
r <- (((m1 - m0) * (p * q/zp))/sd(var_x))
p <- cor_to_p(r, n = length(var_x))
ci_vals <- cor_to_ci(r, n = length(var_x), ci = ci)
data.frame(Parameter1 = x, Parameter2 = y, rho = r, t = p$statistic,
df_error = length(var_y) - 2, p = p$p, CI_low = ci_vals$CI_low,
CI_high = ci_vals$CI_high, Method = "Biserial",
stringsAsFactors = FALSE)
}
This isn't correct--these adjustments increase sampling error (potentially substantially if the binary variable split or polychoric level distribution are very uneven). So, currently, these CIs p values will be too small. One reasonable approximate approach to handling uncertainty in these adjusted correlations is to compute an adjusted sample size n <- ((r^2 - 1)^2 + var_e)/var_e
where var_e
is the estimated sampling error variance for the adjusted correlation.