Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 2.37 KB

File metadata and controls

90 lines (64 loc) · 2.37 KB

Effect-Size Reporting

trgt-instability test can optionally report an allele-level effect size d in addition to the existing parametric-bootstrap p-value.

Purpose

The p-value answers a significance question: How surprising is this allele under the fitted repeat-specific instability model, given its read depth? That makes p-values useful for calling, but not ideal for ranking alleles across different depths. The effect size d is meant to answer a different question: how far is this allele's instability profile from the repeat-specific instability model?

Definitions

Given:

  • the observed instability profile (divergence ratecount vector) for an allele y (instability profile)
  • the parameters of the instability model alpha
  • the corresponding baseline mean instability profile m = alpha / sum(alpha)

The tool calculates:

theta | y, alpha ~ Dirichlet(alpha + y)

For each theta, we compute a 1D Wasserstein distance to the baseline mean profile m on the ordered model bins:

d = W1(theta, m)

The reported summary is based on Monte Carlo draws from that posterior.

CLI Usage

Enable effect-size reporting with:

./trgt-instability test \
  --models models.gz \
  --data sample.dists.txt.gz \
  --report-effect-size

Control the posterior Monte Carlo depth with:

./trgt-instability test \
  --models models.gz \
  --data sample.dists.txt.gz \
  --report-effect-size \
  --n-posterior-draws 4000
  • --n-sim controls the null bootstrap used for the p-value
  • --n-posterior-draws controls the posterior summaries used for d

Output Format

Without effect-size reporting, test emits:

trid    allele_seq    p_value

With --report-effect-size, it emits:

trid    allele_seq    p_value    d_median    d_ci_lower    d_ci_upper

Field meanings:

  • d_median: posterior median of d
  • d_ci_lower: 2.5th posterior percentile
  • d_ci_upper: 97.5th posterior percentile

Interpretation

  • Larger d_median means the allele is estimated to be farther from the fitted repeat-specific baseline.
  • Wider intervals indicate more posterior uncertainty, because the allele has fewer supporting reads.
  • The p-value and d are complementary. A small p-value indicates strong evidence for excess instability. A large d indicates a larger estimated deviation from the fitted baseline instability profile.