Commit 2bccea8
fix(quant): normalize eq-class weights in log space (conserve mapped mass)
The per-fragment eq-class weights were normalized in linear space (w/Σw, guarded
by `wsum > 0`). When a fragment's implied lengths all have ~0 FLD probability
(logFragProb at the no-mass sentinel), every linear weight `w*exp(logFragProb)`
underflows to exactly 0, so Σw == 0, the `wsum > 0` guard leaves the weights all
zero, the eq-class denom is 0, and the VBEM silently drops that class's count —
losing mapped mass. (The EM's degenerate-class branch is a no-op; C++ salmon's is
too, so C++ relies on never producing a zero denom.)
Adopt C++'s normalization: compute each mapping's log weight (ln(score) +
logFragProb) and subtract the per-fragment log-sum-exp (C++ `exp(auxProb -
auxDenom)`). This is mathematically identical to w/Σw for the non-degenerate case
(per-class scaling is EM-invariant) but stays well-defined under total underflow,
yielding relative weights instead of all-zero — so no class is dropped.
On SRR1039508 (full) the mapped-mass loss drops from 190.1 fragments to 0.1
(matching C++); the change vs the prior linear path is within run-to-run wobble
(log-Pearson 0.99956, < the 0.99951 run-to-run baseline) and leaves the nonzero
transcript count unchanged. Reverts the earlier special-case underflow guard in
favor of this general normalization.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B7JMur5DmDpECddErpi2JS1 parent 3e2f559 commit 2bccea8
1 file changed
Lines changed: 37 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
450 | 451 | | |
451 | 452 | | |
452 | 453 | | |
453 | | - | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
454 | 458 | | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | | - | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
464 | 464 | | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
469 | 471 | | |
470 | 472 | | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
471 | 494 | | |
472 | 495 | | |
473 | 496 | | |
| |||
0 commit comments