Skip to content

Commit 3e2f559

Browse files
rob-pclaude
andcommitted
fix(map): a concordant decoy pair must not suppress a transcript orphan
The orphan-fallback rule discarded all orphan mappings whenever the fragment had ANY concordant (proper-pair) mapping — including a concordant pair to a *decoy* (genome). On a decoy-aware index this silently destroyed transcript evidence: a fragment that pairs concordantly on the genome but only orphans onto a transcript (one mate exonic, the other intronic/genomic — extremely common) had its transcript orphan dropped, leaving only the decoy pair. finalize then saw no surviving non-decoy mapping (best_valid = None) and classified the fragment as decoy-dominated. Critically, because best_valid was None, even --allowDecoyOrphans could not rescue it (that flag only gates the "transcript present but dominated" branch) — so the flag was effectively a no-op for the dominant decoy-orphan case. Fix: only a concordant *non-decoy* (transcript) pair suppresses orphans. When the only concordant pair is to a decoy, keep the transcript orphan(s) too and let the decoy-domination logic in finalize_mappings_counted adjudicate — default still drops the decoy-dominated orphan (deny), --allowDecoyOrphans now keeps it. Default behavior is unchanged (verified byte-identical on SRR1039508 full: 21,488,979 mapped both before and after). --allowDecoyOrphans now works as intended: on a per-fragment trace of 8,966 such fragments it recovers 8,183 (was 308), matching C++ salmon's orphan handling; on the full sample the ADO rate rises 93.92% -> 94.81%. This is the SA mirror of the sketch-mode decoy-orphan rescue. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B7JMur5DmDpECddErpi2JS
1 parent a198954 commit 3e2f559

1 file changed

Lines changed: 15 additions & 6 deletions

File tree

crates/salmon-map/src/mapper.rs

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -295,14 +295,23 @@ pub fn map_read_pair<'idx, R: RefProvider>(
295295
MateStatus::SingleEnd => {}
296296
}
297297
}
298-
// Orphans are a *fallback*: if this fragment has any concordant (proper-pair)
299-
// mapping, discard all orphan mappings. A lone mate matching a paralog is weak
300-
// evidence and would spuriously enlarge the equivalence class / leak count mass
301-
// to the wrong transcript; the concordant mappings are the trustworthy signal.
302-
// Orphans are kept only when the fragment has *no* concordant mapping at all.
298+
// Orphans are a *fallback*: if this fragment has a concordant (proper-pair)
299+
// mapping to a *transcript*, discard all orphan mappings. A lone mate matching
300+
// a paralog is weak evidence and would spuriously enlarge the equivalence class
301+
// / leak count mass to the wrong transcript; the concordant transcript mapping
302+
// is the trustworthy signal.
303+
//
304+
// A concordant pair to a *decoy* must NOT suppress a transcript orphan: a
305+
// fragment that pairs on the genome but only orphans onto a transcript would
306+
// otherwise lose its transcript evidence entirely (the decoy pair leaves no
307+
// surviving non-decoy mapping, so `best_valid` is None and even
308+
// `--allowDecoyOrphans` cannot rescue it). Instead we keep both and let the
309+
// decoy-domination logic in `finalize_mappings_counted` adjudicate
310+
// (default: drop the decoy-dominated transcript orphan; `--allowDecoyOrphans`:
311+
// keep it), matching C++ salmon's orphan handling on a decoy-aware index.
303312
if raw
304313
.iter()
305-
.any(|m| matches!(m.status, MateStatus::PairedEndPaired))
314+
.any(|m| matches!(m.status, MateStatus::PairedEndPaired) && !m.is_decoy)
306315
{
307316
raw.retain(|m| matches!(m.status, MateStatus::PairedEndPaired));
308317
}

0 commit comments

Comments
 (0)