Skip to content

Latest commit

 

History

History
55 lines (40 loc) · 1.8 KB

File metadata and controls

55 lines (40 loc) · 1.8 KB

File Formats

Divergence records

trgt-instability divergence outputs one tab-delimited record per allele with 6 fields:

  1. trid: repeat identifier from the repeat catalog
  2. allele_index: allele index reported by TRGT
  3. allele_seq: allele consensus sequence
  4. purity: allele purity value carried through from TRGT
  5. lengths: comma-separated repeat lengths, one per read
  6. distances: comma-separated edit distances, one per read

The lengths and distances lists must have the same number of elements. Downstream commands ignore blank lines and lines starting with #.

These per-read fields are the raw ingredients used to construct read divergence rates and, downstream, the allele instability profile for the allele.

model requires divergence records to be grouped by trid. If the same trid appears in multiple non-contiguous blocks, training fails.

Model records

trgt-instability model emits a tab-delimited record per trid with fields:

  1. trid: tandem repeat identifier
  2. bin_edges: comma-separated bin edge values
  3. alpha: comma-separated Dirichlet-multinomial parameters
  4. training_counts: --separated count vectors retained after outlier trimming, with comma-separated counts within each vector

Test output records

By default, trgt-instability test emits one tab-delimited record per tested allele with 3 fields:

  1. trid
  2. allele_seq
  3. p_value: one-sided empirical p-value for excess instability from the parametric-bootstrap test

When --report-effect-size is enabled, test emits 6 fields instead:

  1. trid
  2. allele_seq
  3. p_value
  4. d_median
  5. d_ci_lower
  6. d_ci_upper

The d_* fields summarize the posterior Wasserstein distance between the allele-specific latent instability profile and the fitted baseline mean instability profile.