Skip to content

fix(phosphors): charge-validate theoretical ions + faithful floorDouble#45

Merged
ypriverol merged 6 commits into
mainfrom
fix/phosphors-fragment-charge-validation
Jun 10, 2026
Merged

fix(phosphors): charge-validate theoretical ions + faithful floorDouble#45
ypriverol merged 6 commits into
mainfrom
fix/phosphors-fragment-charge-validation

Conversation

@timosachsenberg

@timosachsenberg timosachsenberg commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes two real onsite PhosphoRS bugs found while reviewing parity against the
compomics-utilities reference.
Both were verified side-by-side against the live compomics JVM (utilities-5.1.17.jar + commons-math-2.2).

D9 — fragment-charge over-generation (the impactful one)

getSpectrum(..., 1, precursor_charge) generated b/y ions at every charge 1..precursor, including:

  • fragments at the precursor charge (compomics requires charge < precursor), and
  • charges above the ion number (e.g. a y1 at 2+) — physically impossible.

This inflated the binomial trial count n by ~35 % (e.g. PEPS(Phospho)TIDE @3+ with losses: 99 → 64 theoretical ions, now identical to the reference).

Fix: new _theo_mz_charge_valid() applies compomics chargeValidated for peptide-fragment ions — fragment charge in 1..max(1, precursor-1) and charge <= ion_number — plus the phospho neutral-loss name filter. Both live theoretical-ion paths (final scoring and _isoform_theo_mz for depth selection) route through it.

As a side effect this also closes the depth-vs-final inconsistency (D10): the depth-reduction generator now sets add_metainfo='true', so depth selection and final scoring use the same charge-validated, loss-filtered ion set.

D13c — _floor_double was a binary floor, not a decimal floor

The helper claimed to "Mimic Util.floorDouble" but used math.floor(x*10**n)/10**n, which drops a digit when x*10**n lands a hair below an integer in IEEE arithmetic (0.29 → 0.28, 0.0006 → 0.0005), perturbing the random-match probability p. It is livegetp_style feeds both depth selection and final scoring.

Fix: decimal-string floor (Decimal(repr(x)).quantize(..., ROUND_FLOOR)), matching Java Util.floorDouble exactly (getp_style(3, 100, 0.02) = 0.0006).

Not changed (deliberately)

  • The binomial tail convention (compomics P(X>k) vs onsite P(X≥k)): onsite is paper-correct ("at least k"); left as-is.
  • The depth-selection criterion: onsite already maximizes isoform separation (paper-correct); the reference's ratio rule is inverted.
  • Dead _expected_fragment_mzs (now superseded) — left for a separate cleanup.

Testing

  • All 178 tests pass (incl. the data-dependent and decoy-FLR suites).
  • D9 fix reproduces the reference theoretical-ion count exactly against the live compomics JVM.
  • _floor_double now matches Java Util.floorDouble across the divergent inputs.

PHOSPHORS_PARITY_REVIEW.md (added) documents the full parity analysis, the 13-divergence bug classification, and the reproducible Java-vs-Python side-by-side tests.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes

  • Fixed numeric precision calculations to align with reference implementation standards for consistent results

Improvements

  • Strengthened theoretical fragment ion validation with enhanced charge constraints
  • Refined neutral-loss filtering mechanisms for improved accuracy
  • Optimized spectrum generation efficiency during analysis

Two onsite PhosphoRS bugs that diverged from the compomics reference:

- D9 (fragment-charge ladder): getSpectrum(..., 1, precursor_charge) emitted
  fragments AT the precursor charge and above the ion number (e.g. y1 at 2+) -
  physically impossible ions that inflated the binomial trial count n by ~35%
  (verified 99 -> 64 vs the live compomics JVM on PEPS(Phospho)TIDE @3+). New
  _theo_mz_charge_valid() applies compomics chargeValidated (fragment charge in
  1..max(1, precursor-1) AND charge <= ion number) on both live theoretical-ion
  paths (final scoring and _isoform_theo_mz for depth selection). The
  depth-reduction generator now sets add_metainfo=true, so depth selection and
  final scoring share one charge-validated, loss-filtered ion set (also closes
  the D10 depth-vs-final inconsistency).

- D13c (_floor_double): claimed to "Mimic Util.floorDouble" but did a binary
  floor (math.floor(x*10**n)/10**n), dropping a digit on values like 0.29->0.28
  and 0.0006->0.0005 and perturbing the random-match probability p. Replaced
  with a decimal-string floor (Decimal(repr(x)).quantize(..., ROUND_FLOOR));
  now matches Java Util.floorDouble exactly. It is live: getp_style feeds both
  depth selection and final scoring.

Adds PHOSPHORS_PARITY_REVIEW.md documenting the full parity analysis (paper +
live-JVM/Java-vs-Python side-by-side tests) and the bug classification.

All 178 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@timosachsenberg, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 37 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 968b1c3b-fdd9-4139-8eae-541ce00f630e

📥 Commits

Reviewing files that changed from the base of the PR and between 3000c27 and b137115.

📒 Files selected for processing (1)
  • onsite/phosphors/phosphors.py
📝 Walkthrough

Walkthrough

phosphors.py refactors numeric flooring to use Decimal.quantize(ROUND_FLOOR) matching CompOmics Java behavior. It introduces _theo_mz_charge_valid, a per-ion gating function that parses pyOpenMS annotations, enforces charge ≤ ion number, and filters phospho neutral losses by parsed mass. The gating is integrated into _isoform_theo_mz and the scoring loop, replacing the prior MSSpectrum-based approach.

Changes

CompOmics Parity Alignment

Layer / File(s) Summary
Decimal Flooring for Numeric Precision Alignment
onsite/phosphors/phosphors.py
re module and EmpiricalFormula import added. _floor_double refactored from binary/float scaling to Decimal(...).quantize(..., ROUND_FLOOR) with repr(value) to match Java rounding exactly; handles n_decimals <= 0 and non-finite inputs.
Per-Ion Charge and Loss Gating
onsite/phosphors/phosphors.py
New _theo_mz_charge_valid helper and supporting cache/regex functions parse ion annotations, decode neutral-loss formulas, and apply charge validation (charge ≤ ion number) and phospho-loss filtering (drop ions matching phospho mass within tolerance). _isoform_theo_mz updated to return sorted gated m/z. Obsolete _expected_fragment_mzs helper removed.
Integration in Peak-Depth and Scoring Loop
onsite/phosphors/phosphors.py
Peak-depth optimizer changes add_metainfo to "true" for annotation access. Scoring loop replaces per-isoform MSSpectrum generation and manual loss filtering with direct _theo_mz_charge_valid calls; skips isoforms yielding no fragments.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 Decimal floors with grace,
Ion charges face their place,
Phospho losses erased,
CompOmics parity embraced,
Hopping toward precision's face! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: charge-validation of theoretical ions and fixing the floorDouble function to match Java behavior. It is concise, specific, and clearly conveys the primary fixes.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/phosphors-fragment-charge-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 9, 2026

Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 1 medium · 4 minor

Alerts:
⚠ 5 issues (≤ 0 issues of at least minor severity)

Results:
5 new issues

Category Results
Documentation 4 minor
Complexity 1 medium

View in Codacy

🟢 Metrics -6 complexity · -2 duplication

Metric Results
Complexity -6
Duplication -2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

timosachsenberg and others added 2 commits June 9, 2026 19:18
…iew doc

- Remove _expected_fragment_mzs: dead since the D9 fix (no callers; its charge
  policy is now implemented by _theo_mz_charge_valid). MAX_ION_CHARGE is retained
  (still the default for the public max_ion_charge parameter).
- Remove PHOSPHORS_PARITY_REVIEW.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dead since the D9 fix (no callers; its charge policy is now implemented by
_theo_mz_charge_valid). MAX_ION_CHARGE is retained (still the default for the
public max_ion_charge parameter).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Algorithm Comparison Test Results

Click to expand test results
============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /opt/hostedtoolcache/Python/3.11.15/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/onsite/onsite
configfile: pyproject.toml
plugins: cov-7.1.0
collecting ... collected 3 items

tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_lucxor_comparison 
================================================================================
LucXor Comparison Results (q-value < 0.01)
================================================================================

STRICT (Local FLR < 0.01):
  New results: 848
  Reference results: 848
  Overlap: 848 (100.0%)
  Recall: 100.0% (new found 848/848 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Local FLR < 0.05):
  New results: 1064
  Reference results: 1064
  Overlap: 1064 (100.0%)
  Recall: 100.0% (new found 1064/1064 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Local FLR < 0.1):
  New results: 1081
  Reference results: 1081
  Overlap: 1081 (100.0%)
  Recall: 100.0% (new found 1081/1081 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_ascore_comparison 
================================================================================
AScore Comparison Results (q-value < 0.01)
================================================================================

STRICT (AScore >= 20):
  New results: 919
  Reference results: 919
  Overlap: 919 (100.0%)
  Recall: 100.0% (new found 919/919 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (AScore >= 15):
  New results: 1023
  Reference results: 1023
  Overlap: 1023 (100.0%)
  Recall: 100.0% (new found 1023/1023 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (AScore >= 3):
  New results: 1076
  Reference results: 1076
  Overlap: 1076 (100.0%)
  Recall: 100.0% (new found 1076/1076 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_phosphors_comparison 
================================================================================
PhosphoRS Comparison Results (q-value < 0.01)
================================================================================

STRICT (Site probability > 99%):
  New results: 1066
  Reference results: 983
  Overlap: 946 (96.2%)
  Recall: 96.2% (new found 946/983 reference sites)
  Gain rate: 11.3% (120 new-only sites)
  Lost sites: 37
  Count ratio: 1.08x

MODERATE (Site probability > 90%):
  New results: 1104
  Reference results: 1084
  Overlap: 1035 (95.5%)
  Recall: 95.5% (new found 1035/1084 reference sites)
  Gain rate: 6.2% (69 new-only sites)
  Lost sites: 49
  Count ratio: 1.02x

LENIENT (Site probability > 75%):
  New results: 1118
  Reference results: 1102
  Overlap: 1049 (95.2%)
  Recall: 95.2% (new found 1049/1102 reference sites)
  Gain rate: 6.2% (69 new-only sites)
  Lost sites: 53
  Count ratio: 1.01x
PASSED

============================== 3 passed in 43.77s ==============================

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 772-775: The code in _theo_mz_charge_valid currently swallows any
exception from spec.getStringDataArrays() and returns all mzs, bypassing charge
and neutral-loss gating; instead, detect if spec.getStringDataArrays() is
missing or if its first StringDataArray length does not match len(mzs) and fail
fast by raising a clear exception (e.g., ValueError) with a descriptive message;
replace the broad except Exception block that returns [float(m) for m in mzs]
with explicit validation of spec.getStringDataArrays()[0] and a raised error so
callers cannot silently skip the per-ion gating logic.

In `@PHOSPHORS_PARITY_REVIEW.md`:
- Around line 150-151: Update the reproduction commands to avoid hardcoded
machine-specific paths by replacing literal occurrences of /tmp/parity and
/home/sachsenb/Development/onsite with path-portable references (e.g., use
$TMPDIR or ${TMPDIR:-/tmp} for temporary artifact dirs and $HOME or relative
project paths for repo roots); ensure every command and example that mentions
/tmp/parity or /home/sachsenb/Development/onsite (and the repeated blocks around
the later section) uses these variables or a note to set an environment variable
(e.g., PARITY_DIR) so other contributors can run the steps without manual edits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80be3734-a565-4b81-ad7a-88d845105aad

📥 Commits

Reviewing files that changed from the base of the PR and between 486d0bb and 1e08e7d.

📒 Files selected for processing (2)
  • PHOSPHORS_PARITY_REVIEW.md
  • onsite/phosphors/phosphors.py

Comment thread onsite/phosphors/phosphors.py
Comment thread PHOSPHORS_PARITY_REVIEW.md Outdated
@timosachsenberg

Copy link
Copy Markdown
Contributor Author

Full-data before/after impact — data/1.mzML

Ran the PhosphoRS CLI on the full dataset, before (main) vs after (this branch), identical inputs and flags, comparing per-PSM localization. Output is deterministic (threaded == serial), so the diff isolates the fix.

  • Input: data/1.mzML (160 MB) + data/1_consensus_fdr_filter_pep.idXML (3697 hits)
  • Flags: --add-decoys --threads 16 (decoy-AA = Alanine, the project's FLR basis)
  • Scored PSMs (≥2 candidate sites, localizable): 1989 (the rest are trivial/non-phospho/unscored on both sides)
  • Runtime: ~14 s each

Localization call-flips (best-isomer phospho placement changed)

metric count
flips 194 / 1989 (9.75 %)
→ changed which residue is localized 194
→ gained a decoy(A) win (target→A) 8
→ lost a decoy(A) win (A→target) 50

Net −42 decoy wins: removing the physically-impossible high-charge ions (and the floor fix) stops noise that was spuriously supporting Alanine-decoy isoforms.

Decoy-AA FLR signal (best isomer puts a phospho on A)

metric before after Δ
decoy-win PSMs 227 185 −42 (−18.5 %)
decoy-win PSMs (conf ≥95 %) 180 155 −25 (−13.9 %)
decoy placements D 233 191 −42
target placements T 1925 1967 +42
global decoy-AA FLR (Eq.2) 44.20 % 35.46 % −8.74 pp

FLR = 2·(T_c/X_c)·(D/T), with T_c(STY)=4245, X_c(A)=2325. This is the unthresholded global estimator (no score filter), so the absolute value is high — the shift is the signal; the fix lowers FLR by ~20 % relative.

Confidence (max PhosphoRS site probability)

metric before after
confident calls (≥95 %) 1613 1621 (+8)
median max-site-prob 99.988 % 99.997 %

Among confident calls the decoy fraction drops 11.2 % → 9.6 %.

Takeaway

The fix changes ~10 % of localizations and reduces spurious Alanine-decoy wins by ~18 % (decoy-AA FLR 44.2 % → 35.5 %) while slightly increasing the number and confidence of target calls — the expected, beneficial effect of restoring compomics' chargeValidated fragment set.

Note: the stored PhosphoRS score is the raw binomial big_p, which underflows to ~0 for confident calls, so max-site-probability is used as the confidence axis instead.

🤖 Generated with Claude Code

The `-HPO3`/`-PO3H` name filter never matched pyOpenMS's loss annotation
(`-H3O4P1`), so it dropped nothing. Replace it with the compomics
PhosphoRS.java rule: drop a neutral loss only when its mass equals the
modification mass (HPO3, 79.966 Da) -- such a fragment is mass-identical
to the unmodified ion, hence not site-determining. H2O/H3PO4 losses are
kept, so the ion set is unchanged (64 ions for PEPS(Phospho)TIDE @3+) but
now robust to pyOpenMS's real `-HO3P1` spelling.

_floor_double: coerce numpy scalars to Python float -- repr(np.float64(x))
is 'np.float64(x)' on numpy >=2.0, which Decimal() cannot parse.

Memoize the per-ion charge/loss gate by annotation string (a pure function
of the annotation): ~8x faster than the prior per-ion regex parse, making
_theo_mz_charge_valid ~2.6x faster end-to-end. All 178 tests pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
onsite/phosphors/phosphors.py (2)

67-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Check math.isfinite() before the integer-floor fast path.

_floor_double(float("inf"), 0) and _floor_double(float("nan"), 0) hit math.floor() first and raise, so the new non-finite passthrough never applies for n_decimals <= 0.

Suggested fix
-    if n_decimals <= 0:
-        return float(math.floor(value))
     if not math.isfinite(value):
         return value
+    if n_decimals <= 0:
+        return float(math.floor(value))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/phosphors.py` around lines 67 - 70, The early return order
is wrong in _floor_double: check math.isfinite(value) before taking the
integer-floor fast path so non-finite values (inf, nan) are returned untouched;
change the branch order in the function (or around the snippet) to first do "if
not math.isfinite(value): return value" and only then handle "if n_decimals <=
0: return float(math.floor(value))".

762-765: ⚠️ Potential issue | 🟠 Major

Don’t treat spectrum-generation failures as “no ions”

onsite/phosphors/phosphors.py’s _theo_mz_charge_valid() swallows any exception from spec_gen.getSpectrum(...) and returns []; the scoring loop then does if not theo_mz: continue, so the affected isoform is silently omitted from isomer_scores, changing the subsequent probability normalization rather than failing the PSM.

Suggested fix
-    try:
-        spec_gen.getSpectrum(spec, seq, 1, max_z)
-    except Exception:
-        return []
+    spec_gen.getSpectrum(spec, seq, 1, max_z)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/phosphors.py` around lines 762 - 765, The current except in
_theo_mz_charge_valid around spec_gen.getSpectrum(spec, seq, 1, max_z) swallows
all exceptions and returns [], causing downstream code (if not theo_mz:
continue) to silently drop isoforms; instead, catch the exception, log the error
with context (including spec, seq, max_z) and re-raise the exception so the PSM
fails (or return an explicit failure sentinel that the caller checks), i.e.,
update the except block in _theo_mz_charge_valid to not return an empty list but
either re-raise the original exception after logging or return a clearly handled
sentinel and update the scoring loop that builds isomer_scores to treat that
sentinel as a fatal error rather than "no ions".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 747-760: The charge cap passed as max_ion_charge is not applied in
the charge-validated generator: modify _theo_mz_charge_valid to accept a
max_ion_charge parameter and compute max_z as min(max(1, int(precursor_charge) -
1), max_ion_charge) (or enforce the cap equivalently) so fragment generation
never exceeds the caller cap; propagate this new parameter from
calculate_phospho_localization_compomics_style through _isoform_theo_mz and into
any final scoring path that calls _theo_mz_charge_valid so n_expected and
binomial scoring use the same capped set of theoretical ions (alternatively
remove/deprecate max_ion_charge and update callers to reflect that change).

---

Outside diff comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 67-70: The early return order is wrong in _floor_double: check
math.isfinite(value) before taking the integer-floor fast path so non-finite
values (inf, nan) are returned untouched; change the branch order in the
function (or around the snippet) to first do "if not math.isfinite(value):
return value" and only then handle "if n_decimals <= 0: return
float(math.floor(value))".
- Around line 762-765: The current except in _theo_mz_charge_valid around
spec_gen.getSpectrum(spec, seq, 1, max_z) swallows all exceptions and returns
[], causing downstream code (if not theo_mz: continue) to silently drop
isoforms; instead, catch the exception, log the error with context (including
spec, seq, max_z) and re-raise the exception so the PSM fails (or return an
explicit failure sentinel that the caller checks), i.e., update the except block
in _theo_mz_charge_valid to not return an empty list but either re-raise the
original exception after logging or return a clearly handled sentinel and update
the scoring loop that builds isomer_scores to treat that sentinel as a fatal
error rather than "no ions".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a133b7fb-e68c-4910-acd9-f3e6da997d56

📥 Commits

Reviewing files that changed from the base of the PR and between 1e08e7d and 3000c27.

📒 Files selected for processing (1)
  • onsite/phosphors/phosphors.py

Comment on lines +747 to +760
def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list:
"""Charge-validated b/y theoretical fragment m/z for one (modified) peptide.

Replicates compomics PhosphoRS chargeValidated for PEPTIDE_FRAGMENT_ION:
fragment charge in 1 .. max(1, precursor_charge - 1) (charge < precursor)
and charge <= ion number (a y1 cannot be 2+)
and drops any neutral loss whose mass equals the phospho modification mass
(HPO3, 79.966 Da), mirroring PhosphoRS.java -- such a fragment is mass-
identical to the unmodified ion, so it cannot localize the site (H3PO4 and
H2O losses are kept). The charge upper bound is enforced at generation; the
charge<=ion-number gate and the loss filter are read from the ion
annotations, so ``spec_gen`` MUST have ``add_metainfo='true'``. Returns m/z
in generator order (caller sorts if needed)."""
max_z = max(1, int(precursor_charge) - 1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

max_ion_charge is no longer honored by the validated path.

calculate_phospho_localization_compomics_style() still exposes max_ion_charge, but _theo_mz_charge_valid() now always generates through precursor_charge - 1. Callers that cap fragment charge below the precursor will get extra theoretical ions, which changes n_expected and the binomial score. Please thread that cap through both _isoform_theo_mz() and the final scoring path, or remove/deprecate the parameter explicitly.

Suggested fix
-def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list:
+def _theo_mz_charge_valid(
+    spec_gen, seq, precursor_charge, max_ion_charge=None
+) -> list:
@@
-    max_z = max(1, int(precursor_charge) - 1)
+    max_z = max(1, int(precursor_charge) - 1)
+    if max_ion_charge is not None:
+        max_z = min(max_z, int(max_ion_charge))
-def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge):
-    return sorted(_theo_mz_charge_valid(spec_gen, seq_profile, precursor_charge))
+def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge, max_ion_charge=None):
+    return sorted(
+        _theo_mz_charge_valid(
+            spec_gen, seq_profile, precursor_charge, max_ion_charge
+        )
+    )

Also applies to: 783-787, 897-900, 1197-1198

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/phosphors.py` around lines 747 - 760, The charge cap passed
as max_ion_charge is not applied in the charge-validated generator: modify
_theo_mz_charge_valid to accept a max_ion_charge parameter and compute max_z as
min(max(1, int(precursor_charge) - 1), max_ion_charge) (or enforce the cap
equivalently) so fragment generation never exceeds the caller cap; propagate
this new parameter from calculate_phospho_localization_compomics_style through
_isoform_theo_mz and into any final scoring path that calls
_theo_mz_charge_valid so n_expected and binomial scoring use the same capped set
of theoretical ions (alternatively remove/deprecate max_ion_charge and update
callers to reflect that change).

timosachsenberg and others added 2 commits June 9, 2026 21:56
max_ion_charge (and its MAX_ION_CHARGE=2 backing constant) was never read.
Fragment charge is governed by compomics chargeValidated (charge < precursor,
i.e. max_z = precursor_charge - 1), not a fixed cap -- applying the cap would
drop legitimate charge-3+ fragments for >=4+ precursors and diverge from the
reference, so the parameter is removed rather than wired in. No caller passes
it and the CLIs do not expose it; algorithm-comparison results are unchanged
(PhosphoRS 1066/1104/1118, LucXor/AScore identical).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Like max_ion_charge, add_ion_types (and its ADD_ION_TYPES constant) was
never read -- the theoretical-spectrum generators hardcode b/y ions. No
caller passes it and the CLIs do not expose it; algorithm-comparison
results are unchanged (PhosphoRS 1066/1104/1118, LucXor/AScore identical).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ypriverol ypriverol merged commit 9b8c531 into main Jun 10, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants