You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Root cause fixed**: `split_working_transcript` used `wt2_junc_start = boundary_idx.min(n_junctions)` to partition junction_shifts between the two mate portions. Since the inter-mate boundary does NOT produce a junction entry, the index offset was wrong: any junctions belonging to the second portion with index < boundary_idx were silently discarded. Fix: use `wt2_junc_start = wt1_junc_end` (junction split point = number of junctions in the first portion). For the specific case boundary_idx=1 (one mate2 exon in reverse cluster), wt1_junc_end=0 but wt2_junc_start was incorrectly set to 1, causing the 145907N junction's shift (jj_r=8) to be lost. finalize_transcript's overhang check then saw empty junction_shifts and passed the spurious 139M145907N11M alignment (11 < threshold 5+8=13 → should have been rejected).
168
+
**Two fixes in Phase 17.D**:
169
+
1.`try_pair_transcripts` now computes STAR-faithful combined-span penalty: `combined_wt_score = t1.score + t2.score - p1 - p2 + combined_p`. Previously double-applied per-mate span penalties → AS tag wrong for 99.6% of PE reads. Now 3.1%.
170
+
2. Decision tree reordered to (1) position dedup → (2) score-range filter → (3) TooManyLoci → (4) quality filter. Fixes 12 half-mapped pairs.
**Problem 1**: `try_pair_transcripts` used `combined_wt_score = t1.score + t2.score`, double-applying the genomic-length penalty (each mate's finalized score already includes its own span penalty). This inflated the combined score relative to STAR's single-span formula, causing AS tag disagreements (was 99.6% of PE reads).
213
+
214
+
**Fix**: STAR computes ONE genomic-length penalty over the full PE span. Per-mate approach must undo per-mate penalties and apply the combined penalty:
`try_pair_transcripts` now takes `scorer: &AlignmentScorer` to call `scorer.genomic_length_penalty()`.
220
+
221
+
**Problem 2**: Decision tree ordering was score-range → dedup → TooManyLoci (wrong). STAR's ordering is multMapSelect → dedup → TooManyLoci. Also, dedup was running after score-range filter, meaning some duplicate pairs at the same position could escape the score-range window.
222
+
223
+
**Fix** (`src/align/read_align.rs`): Reordered to: (1) position dedup, (2) score-range filter (multMapSelect), (3) TooManyLoci check, (4) sort by score, (5) quality filter (mappedFilter).
224
+
225
+
**Result**: 278/278 tests, 0 warnings. **248 → 236 half-mapped** (12 pairs fixed by correct ordering). PE diff-AS dropped from 99.6% → 3.1%. PE both-mapped: 8767 (STAR: 8390). SE 8796/8926 maintained.
226
+
227
+
**Investigation note**: Attempted quality-filter fallback (retry with pre-score-range pool when score-range winner fails quality) to recover additional half-mapped reads. The specific root cause of remaining 236 half-mapped: STAR's combined-read DP finds correct mate2 alignment with 0 mismatches; ruSTAR's per-mate DP finds a different alignment at the same position with 8 mismatches (combined_nm=14 > outFilterMismatchNmax=10). The fallback recovers 35 pairs but introduces ~100 position regressions (pairs at wrong positions passing individual quality checks). Reverted.
228
+
229
+
---
230
+
209
231
## Phase 17.2: Coordinate-Sorted BAM — Planned
210
232
211
233
High user value. STAR outputs `--outSAMtype BAM SortedByCoordinate` natively. Options:
0 commit comments