phosphoRS: faithful dynamic per-window peak-depth optimization (fidelity, #40)#42
Conversation
…40) Implements the phosphoRS peak-depth optimization (Taus et al. 2011, pseudocode sections 9-12) that the implementation was missing: per 100 m/z window, choose the peak depth (1..8) that maximizes the rank1-rank2 separation between the best and second-best isoform (rank3/rank4 tie-breakers; best-absolute-score when no site-determining ions are present; smaller depth on ties). This replaces _reduce_by_delta_selection, which was dead code AND buggy: its depth-selection ratio (pj/pj1, maximized) was inverted (it minimized separation) and it used the experimental peak count as the binomial n. Since commit d64e2dd ("Use filtered spectrum directly") the active path scored against the unoptimized filtered spectrum. New, separately-tested helpers: - _window_has_site_determining_ions (section 10) - _isoform_peptide_scores / _choose_window_depth (sections 9 & 11) - _reduce_by_peak_depth_optimization (sections 8-12 orchestrator) Gated by ENABLE_PEAK_DEPTH_OPTIMIZATION (default True). Effect on the example data (q<=0.01, 5% global FLR): PhosphoRS yield 151 -> 163, decoy-localization rate 4.3% -> 6.2%; phosphoRS ~3x slower (per-isoform theoretical-spectrum generation). The comparison conclusion is unchanged -- PhosphoRS still recovers far fewer sites than AScore (163 vs 441), confirming its lower reliability on this data is genuine, not an artifact of the missing optimization. +4 unit tests (depth selection, site-determining detection, tie-break, empties). Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Documentation | 21 minor |
| Security | 7 high |
| Complexity | 2 medium |
🟢 Metrics -56 complexity · -11 duplication
Metric Results Complexity -56 Duplication -11
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
…optimizer Deletes the now-unused window-reduction code paths (none had any call site): - _reduce_by_delta_selection (buggy: inverted depth ratio, experimental peak count used as the binomial n) and _reduce_spectrum_by_windows; - their exclusive helpers _site_determining_ions, _window_intensity_thresholds, _count_peaks_above_threshold, _get_intensity_thresholds; - the now-orphaned MIN_DEPTH constant. The faithful _reduce_by_peak_depth_optimization replaces them; _get_window_indexes is kept (used by it). ~371 lines removed; behavior unchanged. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lity w (#40) p = N*d/w (phosphoRS, Taus et al. 2011 section 13) where w is the FULL m/z range of the MS/MS spectrum. The code used the m/z span of the *extracted* peak list instead, which is narrower -> p too large -> scores slightly over-conservative. Use the original spectrum's m/z range (N still = extracted peak count). Before/after on the full dataset (data/1.mzML, 3697 PSMs): 81/1989 per-PSM PhosphoRS_pep_score values change, but the 3-way decoy-AA FLR is unchanged (PhosphoRS 163 sites @ 5% FLR, decoy rate 6.2%; LucXor 574 / AScore 441) -- the deviation was real but immaterial to the comparison here. Kept for fidelity. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c01eb28
into
fix/decoy-flr-scoring-bugs
Summary
Faithful reimplementation of phosphoRS's dynamic per-window peak-depth optimization (Taus et al. 2011, pseudocode §9–§12) — the step §22 calls "the most important conceptual difference from a simple site-localization score." It was missing from the running code.
Fidelity PR (separate from the FLR work in #41 — based on that branch; retarget to
mainafter #41 merges).Background
The active scoring path scored isoforms against the plain
filter_spectrum_for_phosphorsoutput. The intended optimizer,_reduce_by_delta_selection, was dead code (disabled byd64e2dd "Use filtered spectrum directly") and buggy:pj/pj1(maximized) was inverted — it minimized, rather than maximized, isoform separation;n(should be the theoretical-ion count).So phosphoRS was not doing the discrimination-optimized peak selection that defines it.
What this adds
Per 100 m/z window, for each depth 1..8, score every isoform against the top-
depthpeaks (n= theoretical ions in window,k= matched,p = N·d/w) and pick the depth that maximizes rank1−rank2 (then rank1−rank3, rank1−rank4, then best absolute score; smaller depth on ties). When the window has no site-determining ions, maximize the best absolute score instead.New, separately-tested helpers:
_window_has_site_determining_ions(§10)_isoform_peptide_scores,_choose_window_depth(§9 & §11)_reduce_by_peak_depth_optimization(§8–§12 orchestrator)Gated by
ENABLE_PEAK_DEPTH_OPTIMIZATION(defaultTrue). It reuses the consume-once_count_matched_ionsmatcher, so the binomialn/kare consistent with the rest of the scoring.Effect (example data, q ≤ 0.01, 5% global FLR)
The discrimination-maximizing selection is symmetric (it sharpens whichever isoform genuinely matches best), so it both raises the yield (genuine sites rank above decoys with more resolution) and raises the measured decoy rate (faithful phosphoRS localizes to the decoy more often on these ambiguous spectra). The comparison conclusion is unchanged: PhosphoRS still recovers far fewer sites than AScore (163 vs 441) — its lower reliability here is genuine, not an artifact of the missing optimization. Cost: ~3× slower (per-isoform theoretical-spectrum generation; could be shared with the scoring loop in a follow-up).
Notes for the reviewer
ENABLE_PEAK_DEPTH_OPTIMIZATION = Falseto retain the previous behavior._reduce_by_delta_selection,_reduce_spectrum_by_windows) are left in place; a follow-up can delete them.Tests
178 pass (full suite) with the optimizer enabled. +4 unit tests: depth selection maximizes separation, site-determining-ion detection, no-SDI maximizes best score, tie/empty handling.
🤖 Generated with Claude Code