trgt-instability test can optionally report an allele-level effect size d
in addition to the existing parametric-bootstrap p-value.
The p-value answers a significance question: How surprising is this allele under
the fitted repeat-specific instability model, given its read depth? That makes
p-values useful for calling, but not ideal for ranking alleles across different
depths. The effect size d is meant to answer a different question: how far is
this allele's instability profile from the repeat-specific instability model?
Given:
- the observed instability profile (divergence ratecount vector) for an allele
y(instability profile) - the parameters of the instability model
alpha - the corresponding baseline mean instability profile
m = alpha / sum(alpha)
The tool calculates:
theta | y, alpha ~ Dirichlet(alpha + y)
For each theta, we compute a 1D Wasserstein distance to the baseline mean
profile m on the ordered model bins:
d = W1(theta, m)
The reported summary is based on Monte Carlo draws from that posterior.
Enable effect-size reporting with:
./trgt-instability test \
--models models.gz \
--data sample.dists.txt.gz \
--report-effect-sizeControl the posterior Monte Carlo depth with:
./trgt-instability test \
--models models.gz \
--data sample.dists.txt.gz \
--report-effect-size \
--n-posterior-draws 4000--n-simcontrols the null bootstrap used for the p-value--n-posterior-drawscontrols the posterior summaries used ford
Without effect-size reporting, test emits:
trid allele_seq p_value
With --report-effect-size, it emits:
trid allele_seq p_value d_median d_ci_lower d_ci_upper
Field meanings:
d_median: posterior median ofdd_ci_lower: 2.5th posterior percentiled_ci_upper: 97.5th posterior percentile
- Larger
d_medianmeans the allele is estimated to be farther from the fitted repeat-specific baseline. - Wider intervals indicate more posterior uncertainty, because the allele has fewer supporting reads.
- The p-value and
dare complementary. A small p-value indicates strong evidence for excess instability. A largedindicates a larger estimated deviation from the fitted baseline instability profile.