fix(phosphors): charge-validate theoretical ions + faithful floorDouble#45
Conversation
Two onsite PhosphoRS bugs that diverged from the compomics reference: - D9 (fragment-charge ladder): getSpectrum(..., 1, precursor_charge) emitted fragments AT the precursor charge and above the ion number (e.g. y1 at 2+) - physically impossible ions that inflated the binomial trial count n by ~35% (verified 99 -> 64 vs the live compomics JVM on PEPS(Phospho)TIDE @3+). New _theo_mz_charge_valid() applies compomics chargeValidated (fragment charge in 1..max(1, precursor-1) AND charge <= ion number) on both live theoretical-ion paths (final scoring and _isoform_theo_mz for depth selection). The depth-reduction generator now sets add_metainfo=true, so depth selection and final scoring share one charge-validated, loss-filtered ion set (also closes the D10 depth-vs-final inconsistency). - D13c (_floor_double): claimed to "Mimic Util.floorDouble" but did a binary floor (math.floor(x*10**n)/10**n), dropping a digit on values like 0.29->0.28 and 0.0006->0.0005 and perturbing the random-match probability p. Replaced with a decimal-string floor (Decimal(repr(x)).quantize(..., ROUND_FLOOR)); now matches Java Util.floorDouble exactly. It is live: getp_style feeds both depth selection and final scoring. Adds PHOSPHORS_PARITY_REVIEW.md documenting the full parity analysis (paper + live-JVM/Java-vs-Python side-by-side tests) and the bug classification. All 178 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 4 minutes and 37 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughphosphors.py refactors numeric flooring to use ChangesCompOmics Parity Alignment
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Documentation | 4 minor |
| Complexity | 1 medium |
🟢 Metrics -6 complexity · -2 duplication
Metric Results Complexity -6 Duplication -2
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
…iew doc - Remove _expected_fragment_mzs: dead since the D9 fix (no callers; its charge policy is now implemented by _theo_mz_charge_valid). MAX_ION_CHARGE is retained (still the default for the public max_ion_charge parameter). - Remove PHOSPHORS_PARITY_REVIEW.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dead since the D9 fix (no callers; its charge policy is now implemented by _theo_mz_charge_valid). MAX_ION_CHARGE is retained (still the default for the public max_ion_charge parameter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Algorithm Comparison Test ResultsClick to expand test results |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 772-775: The code in _theo_mz_charge_valid currently swallows any
exception from spec.getStringDataArrays() and returns all mzs, bypassing charge
and neutral-loss gating; instead, detect if spec.getStringDataArrays() is
missing or if its first StringDataArray length does not match len(mzs) and fail
fast by raising a clear exception (e.g., ValueError) with a descriptive message;
replace the broad except Exception block that returns [float(m) for m in mzs]
with explicit validation of spec.getStringDataArrays()[0] and a raised error so
callers cannot silently skip the per-ion gating logic.
In `@PHOSPHORS_PARITY_REVIEW.md`:
- Around line 150-151: Update the reproduction commands to avoid hardcoded
machine-specific paths by replacing literal occurrences of /tmp/parity and
/home/sachsenb/Development/onsite with path-portable references (e.g., use
$TMPDIR or ${TMPDIR:-/tmp} for temporary artifact dirs and $HOME or relative
project paths for repo roots); ensure every command and example that mentions
/tmp/parity or /home/sachsenb/Development/onsite (and the repeated blocks around
the later section) uses these variables or a note to set an environment variable
(e.g., PARITY_DIR) so other contributors can run the steps without manual edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 80be3734-a565-4b81-ad7a-88d845105aad
📒 Files selected for processing (2)
PHOSPHORS_PARITY_REVIEW.mdonsite/phosphors/phosphors.py
Full-data before/after impact —
|
| metric | count |
|---|---|
| flips | 194 / 1989 (9.75 %) |
| → changed which residue is localized | 194 |
| → gained a decoy(A) win (target→A) | 8 |
| → lost a decoy(A) win (A→target) | 50 |
Net −42 decoy wins: removing the physically-impossible high-charge ions (and the floor fix) stops noise that was spuriously supporting Alanine-decoy isoforms.
Decoy-AA FLR signal (best isomer puts a phospho on A)
| metric | before | after | Δ |
|---|---|---|---|
| decoy-win PSMs | 227 | 185 | −42 (−18.5 %) |
| decoy-win PSMs (conf ≥95 %) | 180 | 155 | −25 (−13.9 %) |
decoy placements D |
233 | 191 | −42 |
target placements T |
1925 | 1967 | +42 |
| global decoy-AA FLR (Eq.2) | 44.20 % | 35.46 % | −8.74 pp |
FLR = 2·(T_c/X_c)·(D/T), with T_c(STY)=4245, X_c(A)=2325. This is the unthresholded global estimator (no score filter), so the absolute value is high — the shift is the signal; the fix lowers FLR by ~20 % relative.
Confidence (max PhosphoRS site probability)
| metric | before | after |
|---|---|---|
| confident calls (≥95 %) | 1613 | 1621 (+8) |
| median max-site-prob | 99.988 % | 99.997 % |
Among confident calls the decoy fraction drops 11.2 % → 9.6 %.
Takeaway
The fix changes ~10 % of localizations and reduces spurious Alanine-decoy wins by ~18 % (decoy-AA FLR 44.2 % → 35.5 %) while slightly increasing the number and confidence of target calls — the expected, beneficial effect of restoring compomics' chargeValidated fragment set.
Note: the stored PhosphoRS score is the raw binomial big_p, which underflows to ~0 for confident calls, so max-site-probability is used as the confidence axis instead.
🤖 Generated with Claude Code
The `-HPO3`/`-PO3H` name filter never matched pyOpenMS's loss annotation (`-H3O4P1`), so it dropped nothing. Replace it with the compomics PhosphoRS.java rule: drop a neutral loss only when its mass equals the modification mass (HPO3, 79.966 Da) -- such a fragment is mass-identical to the unmodified ion, hence not site-determining. H2O/H3PO4 losses are kept, so the ion set is unchanged (64 ions for PEPS(Phospho)TIDE @3+) but now robust to pyOpenMS's real `-HO3P1` spelling. _floor_double: coerce numpy scalars to Python float -- repr(np.float64(x)) is 'np.float64(x)' on numpy >=2.0, which Decimal() cannot parse. Memoize the per-ion charge/loss gate by annotation string (a pure function of the annotation): ~8x faster than the prior per-ion regex parse, making _theo_mz_charge_valid ~2.6x faster end-to-end. All 178 tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
onsite/phosphors/phosphors.py (2)
67-70:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCheck
math.isfinite()before the integer-floor fast path.
_floor_double(float("inf"), 0)and_floor_double(float("nan"), 0)hitmath.floor()first and raise, so the new non-finite passthrough never applies forn_decimals <= 0.Suggested fix
- if n_decimals <= 0: - return float(math.floor(value)) if not math.isfinite(value): return value + if n_decimals <= 0: + return float(math.floor(value))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@onsite/phosphors/phosphors.py` around lines 67 - 70, The early return order is wrong in _floor_double: check math.isfinite(value) before taking the integer-floor fast path so non-finite values (inf, nan) are returned untouched; change the branch order in the function (or around the snippet) to first do "if not math.isfinite(value): return value" and only then handle "if n_decimals <= 0: return float(math.floor(value))".
762-765:⚠️ Potential issue | 🟠 MajorDon’t treat spectrum-generation failures as “no ions”
onsite/phosphors/phosphors.py’s_theo_mz_charge_valid()swallows any exception fromspec_gen.getSpectrum(...)and returns[]; the scoring loop then doesif not theo_mz: continue, so the affected isoform is silently omitted fromisomer_scores, changing the subsequent probability normalization rather than failing the PSM.Suggested fix
- try: - spec_gen.getSpectrum(spec, seq, 1, max_z) - except Exception: - return [] + spec_gen.getSpectrum(spec, seq, 1, max_z)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@onsite/phosphors/phosphors.py` around lines 762 - 765, The current except in _theo_mz_charge_valid around spec_gen.getSpectrum(spec, seq, 1, max_z) swallows all exceptions and returns [], causing downstream code (if not theo_mz: continue) to silently drop isoforms; instead, catch the exception, log the error with context (including spec, seq, max_z) and re-raise the exception so the PSM fails (or return an explicit failure sentinel that the caller checks), i.e., update the except block in _theo_mz_charge_valid to not return an empty list but either re-raise the original exception after logging or return a clearly handled sentinel and update the scoring loop that builds isomer_scores to treat that sentinel as a fatal error rather than "no ions".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 747-760: The charge cap passed as max_ion_charge is not applied in
the charge-validated generator: modify _theo_mz_charge_valid to accept a
max_ion_charge parameter and compute max_z as min(max(1, int(precursor_charge) -
1), max_ion_charge) (or enforce the cap equivalently) so fragment generation
never exceeds the caller cap; propagate this new parameter from
calculate_phospho_localization_compomics_style through _isoform_theo_mz and into
any final scoring path that calls _theo_mz_charge_valid so n_expected and
binomial scoring use the same capped set of theoretical ions (alternatively
remove/deprecate max_ion_charge and update callers to reflect that change).
---
Outside diff comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 67-70: The early return order is wrong in _floor_double: check
math.isfinite(value) before taking the integer-floor fast path so non-finite
values (inf, nan) are returned untouched; change the branch order in the
function (or around the snippet) to first do "if not math.isfinite(value):
return value" and only then handle "if n_decimals <= 0: return
float(math.floor(value))".
- Around line 762-765: The current except in _theo_mz_charge_valid around
spec_gen.getSpectrum(spec, seq, 1, max_z) swallows all exceptions and returns
[], causing downstream code (if not theo_mz: continue) to silently drop
isoforms; instead, catch the exception, log the error with context (including
spec, seq, max_z) and re-raise the exception so the PSM fails (or return an
explicit failure sentinel that the caller checks), i.e., update the except block
in _theo_mz_charge_valid to not return an empty list but either re-raise the
original exception after logging or return a clearly handled sentinel and update
the scoring loop that builds isomer_scores to treat that sentinel as a fatal
error rather than "no ions".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a133b7fb-e68c-4910-acd9-f3e6da997d56
📒 Files selected for processing (1)
onsite/phosphors/phosphors.py
| def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list: | ||
| """Charge-validated b/y theoretical fragment m/z for one (modified) peptide. | ||
|
|
||
| Replicates compomics PhosphoRS chargeValidated for PEPTIDE_FRAGMENT_ION: | ||
| fragment charge in 1 .. max(1, precursor_charge - 1) (charge < precursor) | ||
| and charge <= ion number (a y1 cannot be 2+) | ||
| and drops any neutral loss whose mass equals the phospho modification mass | ||
| (HPO3, 79.966 Da), mirroring PhosphoRS.java -- such a fragment is mass- | ||
| identical to the unmodified ion, so it cannot localize the site (H3PO4 and | ||
| H2O losses are kept). The charge upper bound is enforced at generation; the | ||
| charge<=ion-number gate and the loss filter are read from the ion | ||
| annotations, so ``spec_gen`` MUST have ``add_metainfo='true'``. Returns m/z | ||
| in generator order (caller sorts if needed).""" | ||
| max_z = max(1, int(precursor_charge) - 1) |
There was a problem hiding this comment.
max_ion_charge is no longer honored by the validated path.
calculate_phospho_localization_compomics_style() still exposes max_ion_charge, but _theo_mz_charge_valid() now always generates through precursor_charge - 1. Callers that cap fragment charge below the precursor will get extra theoretical ions, which changes n_expected and the binomial score. Please thread that cap through both _isoform_theo_mz() and the final scoring path, or remove/deprecate the parameter explicitly.
Suggested fix
-def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list:
+def _theo_mz_charge_valid(
+ spec_gen, seq, precursor_charge, max_ion_charge=None
+) -> list:
@@
- max_z = max(1, int(precursor_charge) - 1)
+ max_z = max(1, int(precursor_charge) - 1)
+ if max_ion_charge is not None:
+ max_z = min(max_z, int(max_ion_charge))-def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge):
- return sorted(_theo_mz_charge_valid(spec_gen, seq_profile, precursor_charge))
+def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge, max_ion_charge=None):
+ return sorted(
+ _theo_mz_charge_valid(
+ spec_gen, seq_profile, precursor_charge, max_ion_charge
+ )
+ )Also applies to: 783-787, 897-900, 1197-1198
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@onsite/phosphors/phosphors.py` around lines 747 - 760, The charge cap passed
as max_ion_charge is not applied in the charge-validated generator: modify
_theo_mz_charge_valid to accept a max_ion_charge parameter and compute max_z as
min(max(1, int(precursor_charge) - 1), max_ion_charge) (or enforce the cap
equivalently) so fragment generation never exceeds the caller cap; propagate
this new parameter from calculate_phospho_localization_compomics_style through
_isoform_theo_mz and into any final scoring path that calls
_theo_mz_charge_valid so n_expected and binomial scoring use the same capped set
of theoretical ions (alternatively remove/deprecate max_ion_charge and update
callers to reflect that change).
max_ion_charge (and its MAX_ION_CHARGE=2 backing constant) was never read. Fragment charge is governed by compomics chargeValidated (charge < precursor, i.e. max_z = precursor_charge - 1), not a fixed cap -- applying the cap would drop legitimate charge-3+ fragments for >=4+ precursors and diverge from the reference, so the parameter is removed rather than wired in. No caller passes it and the CLIs do not expose it; algorithm-comparison results are unchanged (PhosphoRS 1066/1104/1118, LucXor/AScore identical). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Like max_ion_charge, add_ion_types (and its ADD_ION_TYPES constant) was never read -- the theoretical-spectrum generators hardcode b/y ions. No caller passes it and the CLIs do not expose it; algorithm-comparison results are unchanged (PhosphoRS 1066/1104/1118, LucXor/AScore identical). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Fixes two real
onsitePhosphoRS bugs found while reviewing parity against thecompomics-utilities reference.
Both were verified side-by-side against the live compomics JVM (
utilities-5.1.17.jar+commons-math-2.2).D9 — fragment-charge over-generation (the impactful one)
getSpectrum(..., 1, precursor_charge)generated b/y ions at every charge1..precursor, including:charge < precursor), andy1at 2+) — physically impossible.This inflated the binomial trial count
nby ~35 % (e.g.PEPS(Phospho)TIDE@3+ with losses: 99 → 64 theoretical ions, now identical to the reference).Fix: new
_theo_mz_charge_valid()applies compomicschargeValidatedfor peptide-fragment ions — fragment charge in1..max(1, precursor-1)andcharge <= ion_number— plus the phospho neutral-loss name filter. Both live theoretical-ion paths (final scoring and_isoform_theo_mzfor depth selection) route through it.As a side effect this also closes the depth-vs-final inconsistency (D10): the depth-reduction generator now sets
add_metainfo='true', so depth selection and final scoring use the same charge-validated, loss-filtered ion set.D13c —
_floor_doublewas a binary floor, not a decimal floorThe helper claimed to "Mimic
Util.floorDouble" but usedmath.floor(x*10**n)/10**n, which drops a digit whenx*10**nlands a hair below an integer in IEEE arithmetic (0.29 → 0.28,0.0006 → 0.0005), perturbing the random-match probabilityp. It is live —getp_stylefeeds both depth selection and final scoring.Fix: decimal-string floor (
Decimal(repr(x)).quantize(..., ROUND_FLOOR)), matching JavaUtil.floorDoubleexactly (getp_style(3, 100, 0.02) = 0.0006).Not changed (deliberately)
P(X>k)vs onsiteP(X≥k)): onsite is paper-correct ("at least k"); left as-is._expected_fragment_mzs(now superseded) — left for a separate cleanup.Testing
_floor_doublenow matches JavaUtil.floorDoubleacross the divergent inputs.PHOSPHORS_PARITY_REVIEW.md(added) documents the full parity analysis, the 13-divergence bug classification, and the reproducible Java-vs-Python side-by-side tests.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Improvements