@@ -292,6 +292,63 @@ simulated references (same clipped FASTA, matched thresholds) the per-read
292292target-set residual is 147 Rust-superset / 50 C++-superset, unchanged by the cap
293293fix now that both cap on the aligned-mapping count.
294294
295+ ## Decoy-orphan handling in selective alignment (` --allowDecoyOrphans ` )
296+
297+ Two related fixes to how a fragment that pairs concordantly on the genome decoy
298+ but only orphans onto a transcript (one mate exonic, the other intronic/genomic —
299+ very common with a decoy-aware index) is handled:
300+
301+ - ** A concordant decoy pair no longer suppresses a transcript orphan.** The
302+ orphan-fallback rule discarded * all* orphans whenever any concordant pair
303+ existed — including a concordant pair to a * decoy* . That destroyed the
304+ transcript orphan, leaving only the decoy pair, so the fragment was dropped as
305+ decoy-dominated with no surviving non-decoy mapping — and ` --allowDecoyOrphans `
306+ could not even rescue it (it only acts when a transcript mapping survives). Now
307+ only a concordant * transcript* pair suppresses orphans; a decoy pair leaves the
308+ transcript orphan for the decoy-domination logic to adjudicate.
309+ - ** ` --allowDecoyOrphans ` now works as intended.** With the above fixed, the flag
310+ recovers the transcript orphan when the other mate maps to the genome decoy
311+ (default still drops it). On SRR1039508 (full) this raises the ` --allowDecoyOrphans `
312+ rate 93.92 % → 94.81 %; the ** default rate is unchanged** (byte-identical), and
313+ the recovered fragments match what C++ keeps as orphans. This is the
314+ selective-alignment mirror of the sketch-mode decoy-orphan rescue above.
315+
316+ ## Equivalence-class weights: log-space normalization (no lost mapped mass)
317+
318+ Per-fragment equivalence-class weights are now normalized in ** log** space
319+ (` exp(auxProb − auxDenom) ` , as C++ salmon does) rather than linearly (` w/Σw ` ). The
320+ linear form, guarded by ` Σw > 0 ` , silently produced all-zero weights for a
321+ fragment whose implied lengths all have ~ 0 fragment-length-distribution
322+ probability (every ` w·exp(logFragProb) ` underflows to 0); the VBEM then dropped
323+ that equivalence class's count, ** losing mapped mass** . The log-space form is
324+ mathematically identical for the normal case (per-class scaling is EM-invariant)
325+ but stays well-defined under total underflow. On SRR1039508 (full) the
326+ mapped-mass loss (sum of ` quant.sf ` ` NumReads ` vs ` num_mapped ` ) drops from ** 190
327+ fragments to 0.1** — matching C++.
328+
329+ ## ` num_dovetail_fragments ` is now reported
330+
331+ Rust always dropped dovetailed concordant pairs under the default no-dovetail
332+ policy (matching C++) but reported ` num_dovetail_fragments = 0 ` , because it only
333+ inspected surviving pairs (never dovetailed after filtering). The counter is now
334+ wired to the pairing stage and reports fragments whose only concordant pairing was
335+ a dovetail. Diagnostic only — no change to mapping or quantification.
336+
337+ ## Run-to-run reproducibility
338+
339+ Quantification is ** byte-identical run-to-run when single-threaded** (` -p 1 ` ); the
340+ duplicate-transcript symmetry fix (per-fragment FLD snapshot + uniform init) means
341+ exact-duplicate groups converge deterministically. Multi-threaded runs have a
342+ small residual wobble (~ 0.26 % of assigned reads on SRR1039508, nonzero-transcript
343+ set unchanged) from the ** stochastic FLD training** under nondeterministic
344+ fragment→thread scheduling — the same mechanism present in C++ salmon. Measured
345+ head-to-head (` -p 16 ` , two runs each), Rust is in fact ** more reproducible than
346+ C++** : about half the average and total per-transcript variation, and far fewer
347+ transcripts shifting by >2 % (Rust 2.2 % of expressed transcripts vs C++ 5.1 %).
348+ Full parallel determinism (per-fragment-seeded FLD acceptance + order-independent
349+ accumulation) is tracked as a follow-up; it is an improvement * over* C++, not a
350+ correctness gap.
351+
295352## Related C++ fixes (salmon 1.12.1)
296353
297354These were also applied to the final C++ line so the two implementations agree:
0 commit comments