Effect-Size Reporting

trgt-instability test can optionally report an allele-level effect size d in addition to the existing parametric-bootstrap p-value.

Purpose

The p-value answers a significance question: How surprising is this allele under the fitted repeat-specific instability model, given its read depth? That makes p-values useful for calling, but not ideal for ranking alleles across different depths. The effect size d is meant to answer a different question: how far is this allele's instability profile from the repeat-specific instability model?

Definitions

Given:

the observed instability profile (divergence ratecount vector) for an allele y (instability profile)
the parameters of the instability model alpha
the corresponding baseline mean instability profile m = alpha / sum(alpha)

The tool calculates:

theta | y, alpha ~ Dirichlet(alpha + y)

For each theta, we compute a 1D Wasserstein distance to the baseline mean profile m on the ordered model bins:

d = W1(theta, m)

The reported summary is based on Monte Carlo draws from that posterior.

CLI Usage

Enable effect-size reporting with:

./trgt-instability test \
  --models models.gz \
  --data sample.dists.txt.gz \
  --report-effect-size

Control the posterior Monte Carlo depth with:

./trgt-instability test \
  --models models.gz \
  --data sample.dists.txt.gz \
  --report-effect-size \
  --n-posterior-draws 4000

--n-sim controls the null bootstrap used for the p-value
--n-posterior-draws controls the posterior summaries used for d

Output Format

Without effect-size reporting, test emits:

trid    allele_seq    p_value

With --report-effect-size, it emits:

trid    allele_seq    p_value    d_median    d_ci_lower    d_ci_upper

Field meanings:

d_median: posterior median of d
d_ci_lower: 2.5th posterior percentile
d_ci_upper: 97.5th posterior percentile

Interpretation

Larger d_median means the allele is estimated to be farther from the fitted repeat-specific baseline.
Wider intervals indicate more posterior uncertainty, because the allele has fewer supporting reads.
The p-value and d are complementary. A small p-value indicates strong evidence for excess instability. A large d indicates a larger estimated deviation from the fitted baseline instability profile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect-Size Reporting

Purpose

Definitions

CLI Usage

Output Format

Interpretation

FilesExpand file tree

effect-size-reporting.md

Latest commit

History

effect-size-reporting.md

File metadata and controls

Effect-Size Reporting

Purpose

Definitions

CLI Usage

Output Format

Interpretation