Commit 5682ef0
fix(map): penalize indels hidden between read-overlapping MEMs
`anchored_align_score` treated any pair of adjacent chain MEMs that
overlapped in the read as a pure overlap (`score -= max_overlap * match`),
which silently absorbed an indel whenever the two MEMs sat on different
diagonals (read-overlap != ref-overlap). Raw k-mer anchors always overlap
in the read, so a small reference-side indel between two otherwise-exact
blocks was scored as a perfect match — inflating the alignment score and
keeping transcripts the read does not actually match well.
Concrete case: ERR188044.600028 vs ENST00000677249.1 has a 3 bp insertion
relative to the transcript. Two MEMs overlapped by 1 base in the read but
4 in the reference (a 3 bp diagonal jump). The old branch scored it 152
(perfect); the true optimum is 134. The inflated score cleared the
`minAlnProb` keep filter, so Rust retained the transcript while C++
(PuffAligner gap DP) and Rust's own `--fullLengthAlignment` path both
scored 134 and correctly dropped it.
Fix: unify the gap/overlap handling. Compute the per-axis overlap, trim
both MEM ends back to a common boundary, and DP-align the residual:
* same diagonal -> residual empty -> gap DP returns 0 (old behavior)
* gap-separated -> DP the gap (old `if` behavior)
* shifted diagonals -> residual exposes the indel, charged an affine gap
This matches the full-length DP and PuffAligner; uni-MEM extension would
not fix it (the indel-at-boundary geometry survives extension).
Validation on a 1M-read GEUVADIS subset (matched dedup indices,
C++ --orphanChainSubThresh 0.0): C++/Rust target-set concordance rises
97.53% -> 99.14% (mean per-read Jaccard 0.99628).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>1 parent 9604e21 commit 5682ef0
1 file changed
Lines changed: 45 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
234 | 243 | | |
235 | 244 | | |
236 | 245 | | |
| |||
627 | 636 | | |
628 | 637 | | |
629 | 638 | | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
630 | 664 | | |
631 | 665 | | |
632 | 666 | | |
| |||
0 commit comments