Skip to content

Commit 5d454b0

Browse files
committed
split_combined_wt n_mismatch propagation
1 parent 35638fd commit 5d454b0

3 files changed

Lines changed: 8 additions & 22 deletions

File tree

CLAUDE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Always run `cargo clippy`, `cargo fmt --check`, and `cargo test` before consider
3232

3333
## Current Status
3434

35-
**278 tests passing, 0 clippy warnings.** SE: 8796/8926 compare_sam.py (98.5%), 2.2% splice rate (STAR: 2.2%), 66 shared junctions, **100.0% MAPQ agreement, MAPQ inflation: 0, deflation: 0**. 127 position disagreements (ALL verified as genuine ties). 1 CIGAR-only disagree (ERR12389696.13573895, insertion placement, seed-level tie). **0 STAR-only / 0 ruSTAR-only SE reads**. PE: **8337 both-mapped** (STAR: 8390), **0 half-mapped**, 2 MAPQ inflations / 4 deflations, **98.9% per-mate position agreement**, **98.800% PE exact faithfulness** (pos+CIGAR+MAPQ+proper+NH), **0 proper-pair diffs**. Phase 17.A: `scoreSeedBest` pre-extension. Phase 17.B: per-mate seeding. Phase 17.C: STAR-faithful SCORE-GATE + mappedFilter. Phase 17.D: combined-span penalty fix + dedup ordering. Phase 17.8: `--quantMode GeneCounts`. Phase E fix (2026-04-21): mate_id-aware diagonal dedup. Phase E2 (2026-04-22): STAR-faithful combined-read seeding (97.370%→98.211%). Phase E3 (2026-04-22): combined-threshold for half-mapped fallback (98.211%→98.470%, half-mapped 311→0). Phase E4 (2026-04-22): PE-CHECK2 unconditional (98.470%→98.800%, both-mapped 8636→8337). See [ROADMAP.md](ROADMAP.md) for detailed phase tracking and [docs/](docs/) for per-phase notes.
35+
**278 tests passing, 0 clippy warnings.** SE: 8796/8926 compare_sam.py (98.5%), 2.2% splice rate (STAR: 2.2%), 66 shared junctions, **100.0% MAPQ agreement, MAPQ inflation: 0, deflation: 0**. 127 position disagreements (ALL verified as genuine ties). 1 CIGAR-only disagree (ERR12389696.13573895, insertion placement, seed-level tie). **0 STAR-only / 0 ruSTAR-only SE reads**. PE: **8393 both-mapped** (STAR: 8390), **0 half-mapped**, 2 MAPQ inflations / 6 deflations, **99.0% per-mate position agreement**, **98.784% PE exact faithfulness** (pos+CIGAR+MAPQ+proper+NH), **0 proper-pair diffs**. Phase 17.A: `scoreSeedBest` pre-extension. Phase 17.B: per-mate seeding. Phase 17.C: STAR-faithful SCORE-GATE + mappedFilter. Phase 17.D: combined-span penalty fix + dedup ordering. Phase 17.8: `--quantMode GeneCounts`. Phase E fix (2026-04-21): mate_id-aware diagonal dedup. Phase E2 (2026-04-22): STAR-faithful combined-read seeding (97.370%→98.211%). Phase E3 (2026-04-22): combined-threshold for half-mapped fallback (98.211%→98.470%, half-mapped 311→0). Phase E4 (2026-04-22): PE-CHECK2 unconditional (98.470%→98.800%, both-mapped 8636→8337). Phase E5 (2026-04-23): split_combined_wt n_mismatch propagation (8337→8393, −53 STAR-only pairs). See [ROADMAP.md](ROADMAP.md) for detailed phase tracking and [docs/](docs/) for per-phase notes.
3636

3737
## Source Layout
3838

@@ -161,17 +161,17 @@ Previously listed issues now resolved:
161161

162162
See [ROADMAP.md](ROADMAP.md) and [docs/](docs/) for full issue tracking.
163163

164-
## PE Status (Updated 2026-04-22 — Phase E4: PE-CHECK2 unconditional)
164+
## PE Status (Updated 2026-04-23 — Phase E5: split_combined_wt n_mismatch propagation)
165165

166-
**Phase E4** (PE-CHECK2 unconditional): **PE both-mapped = 8337** (STAR: 8390), **half-mapped = 0**, **98.9% per-mate position agreement**, **98.800% PE exact faithfulness** (was 98.470%). MAPQ inflations: 2 (was 4), deflations: 4 (was 34). NH diffs: 12 (was 50).
166+
**Phase E5** (2026-04-23, n_mismatch propagation): **PE both-mapped = 8393** (STAR: 8390), **half-mapped = 0**, **99.0% per-mate position agreement**, **98.784% PE exact faithfulness**. MAPQ inflations: 2, deflations: 6. NH diffs: 14. Root cause fixed: `split_combined_wt` was setting `n_mismatch: 0` for both mate WTs, causing `finalize_transcript`'s outer extension to use too-lenient nMMprev → over-extension → nm2 inflated → 53 STAR-only pairs rejected. Fix: propagate `wt.n_mismatch` from the combined WT to both mate WTs. 55 previously STAR-only pairs now recovered. 7 new FPs (seeding-level issue, separate from nm fix).
167+
168+
**Phase E4** (2026-04-22, PE-CHECK2 unconditional): both-mapped 8636→8337, faithfulness 98.470%→98.800%.
167169

168170
**Phase E3** (2026-04-22, combined-threshold half-mapped): half-mapped 311→0, faithfulness 98.211%→98.470%.
169171

170172
**Phase E fix** (2026-04-21, mate_id-aware diagonal dedup): raised faithfulness from 93.920%→97.370%. Still present.
171173

172-
**Phase E4 implementation**: Removed `m1_exons.len() > 1` guard from PE-CHECK2 in `split_combined_wt`. STAR applies PE-CHECK2 unconditionally (verified via debug trace: `PE-CHECK2: m1_end=11656633 m2_estEnd=11656632 REJECT=1` for single-exon mate1). Previously ruSTAR only checked for spliced mate1, allowing overlapping/short-insert pairs through.
173-
174-
**Current PE parity**: 8337 vs STAR 8390 (53 short). The 60 STAR-only pairs are heavily soft-clipped reads (50%+ clipped) where ruSTAR can't find sufficient seeds. Not a regression from PE-CHECK2.
174+
**Current PE parity**: 8393 vs STAR 8390 (+3 over). 5 STAR-only mates remain (2-3 pairs). 7 new FPs (heavily-clipped reads STAR doesn't seed). Residual gap is seeding-level, not nm-level.
175175

176176
## Remaining Limitations (Top 5)
177177

src/align/read_align.rs

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -609,20 +609,6 @@ pub fn align_paired_read(
609609
// Cluster combined seeds using the combined read length
610610
let clusters = cluster_seeds(&combined_seeds, index, params, combined_len);
611611

612-
if debug_pe {
613-
eprintln!(
614-
"[DEBUG-PE] Combined: seeds={} clusters={}",
615-
combined_seeds.len(),
616-
clusters.len()
617-
);
618-
for (i, c) in clusters.iter().enumerate() {
619-
eprintln!(
620-
" cluster[{}]: rev={} chr={} seeds={}",
621-
i, c.is_reverse, c.chr_idx, c.alignments.len()
622-
);
623-
}
624-
}
625-
626612
// Combined score threshold: use len1+len2 as denominator
627613
let combined_score_threshold =
628614
(params.out_filter_score_min_over_lread * (len1 + len2) as f64) as i32;

src/align/stitch.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2406,7 +2406,7 @@ pub(crate) fn split_combined_wt(
24062406
WorkingTranscript {
24072407
exons: m1_exons,
24082408
score: m1_score,
2409-
n_mismatch: 0,
2409+
n_mismatch: wt.n_mismatch,
24102410
n_gap: 0,
24112411
n_junction: m1_jm.len() as u32,
24122412
junction_motifs: m1_jm,
@@ -2421,7 +2421,7 @@ pub(crate) fn split_combined_wt(
24212421
WorkingTranscript {
24222422
exons: m2_exons,
24232423
score: m2_score,
2424-
n_mismatch: 0,
2424+
n_mismatch: wt.n_mismatch,
24252425
n_gap: 0,
24262426
n_junction: m2_jm.len() as u32,
24272427
junction_motifs: m2_jm,

0 commit comments

Comments
 (0)