fix(phosphors): charge-validate theoretical ions + faithful floorDouble by timosachsenberg · Pull Request #45 · bigbio/onsite

timosachsenberg · 2026-06-09T17:15:53Z

Summary

Fixes two real onsite PhosphoRS bugs found while reviewing parity against the
compomics-utilities reference.
Both were verified side-by-side against the live compomics JVM (utilities-5.1.17.jar + commons-math-2.2).

D9 — fragment-charge over-generation (the impactful one)

getSpectrum(..., 1, precursor_charge) generated b/y ions at every charge 1..precursor, including:

fragments at the precursor charge (compomics requires charge < precursor), and
charges above the ion number (e.g. a y1 at 2+) — physically impossible.

This inflated the binomial trial count n by ~35 % (e.g. PEPS(Phospho)TIDE @3+ with losses: 99 → 64 theoretical ions, now identical to the reference).

Fix: new _theo_mz_charge_valid() applies compomics chargeValidated for peptide-fragment ions — fragment charge in 1..max(1, precursor-1) and charge <= ion_number — plus the phospho neutral-loss name filter. Both live theoretical-ion paths (final scoring and _isoform_theo_mz for depth selection) route through it.

As a side effect this also closes the depth-vs-final inconsistency (D10): the depth-reduction generator now sets add_metainfo='true', so depth selection and final scoring use the same charge-validated, loss-filtered ion set.

D13c — `_floor_double` was a binary floor, not a decimal floor

The helper claimed to "Mimic Util.floorDouble" but used math.floor(x*10**n)/10**n, which drops a digit when x*10**n lands a hair below an integer in IEEE arithmetic (0.29 → 0.28, 0.0006 → 0.0005), perturbing the random-match probability p. It is live — getp_style feeds both depth selection and final scoring.

Fix: decimal-string floor (Decimal(repr(x)).quantize(..., ROUND_FLOOR)), matching Java Util.floorDouble exactly (getp_style(3, 100, 0.02) = 0.0006).

Not changed (deliberately)

The binomial tail convention (compomics P(X>k) vs onsite P(X≥k)): onsite is paper-correct ("at least k"); left as-is.
The depth-selection criterion: onsite already maximizes isoform separation (paper-correct); the reference's ratio rule is inverted.
Dead _expected_fragment_mzs (now superseded) — left for a separate cleanup.

Testing

All 178 tests pass (incl. the data-dependent and decoy-FLR suites).
D9 fix reproduces the reference theoretical-ion count exactly against the live compomics JVM.
_floor_double now matches Java Util.floorDouble across the divergent inputs.

PHOSPHORS_PARITY_REVIEW.md (added) documents the full parity analysis, the 13-divergence bug classification, and the reproducible Java-vs-Python side-by-side tests.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes

Fixed numeric precision calculations to align with reference implementation standards for consistent results

Improvements

Strengthened theoretical fragment ion validation with enhanced charge constraints
Refined neutral-loss filtering mechanisms for improved accuracy
Optimized spectrum generation efficiency during analysis

Two onsite PhosphoRS bugs that diverged from the compomics reference: - D9 (fragment-charge ladder): getSpectrum(..., 1, precursor_charge) emitted fragments AT the precursor charge and above the ion number (e.g. y1 at 2+) - physically impossible ions that inflated the binomial trial count n by ~35% (verified 99 -> 64 vs the live compomics JVM on PEPS(Phospho)TIDE @3+). New _theo_mz_charge_valid() applies compomics chargeValidated (fragment charge in 1..max(1, precursor-1) AND charge <= ion number) on both live theoretical-ion paths (final scoring and _isoform_theo_mz for depth selection). The depth-reduction generator now sets add_metainfo=true, so depth selection and final scoring share one charge-validated, loss-filtered ion set (also closes the D10 depth-vs-final inconsistency). - D13c (_floor_double): claimed to "Mimic Util.floorDouble" but did a binary floor (math.floor(x*10**n)/10**n), dropping a digit on values like 0.29->0.28 and 0.0006->0.0005 and perturbing the random-match probability p. Replaced with a decimal-string floor (Decimal(repr(x)).quantize(..., ROUND_FLOOR)); now matches Java Util.floorDouble exactly. It is live: getp_style feeds both depth selection and final scoring. Adds PHOSPHORS_PARITY_REVIEW.md documenting the full parity analysis (paper + live-JVM/Java-vs-Python side-by-side tests) and the bug classification. All 178 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

qodo-code-review · 2026-06-09T17:15:57Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-06-09T17:16:09Z

Warning

Review limit reached

@timosachsenberg, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 37 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 968b1c3b-fdd9-4139-8eae-541ce00f630e

📥 Commits

Reviewing files that changed from the base of the PR and between 3000c27 and b137115.

📒 Files selected for processing (1)

onsite/phosphors/phosphors.py

📝 Walkthrough

Walkthrough

phosphors.py refactors numeric flooring to use Decimal.quantize(ROUND_FLOOR) matching CompOmics Java behavior. It introduces _theo_mz_charge_valid, a per-ion gating function that parses pyOpenMS annotations, enforces charge ≤ ion number, and filters phospho neutral losses by parsed mass. The gating is integrated into _isoform_theo_mz and the scoring loop, replacing the prior MSSpectrum-based approach.

Changes

CompOmics Parity Alignment

Layer / File(s)	Summary
Decimal Flooring for Numeric Precision Alignment `onsite/phosphors/phosphors.py`	`re` module and `EmpiricalFormula` import added. `_floor_double` refactored from binary/float scaling to `Decimal(...).quantize(..., ROUND_FLOOR)` with `repr(value)` to match Java rounding exactly; handles `n_decimals <= 0` and non-finite inputs.
Per-Ion Charge and Loss Gating `onsite/phosphors/phosphors.py`	New `_theo_mz_charge_valid` helper and supporting cache/regex functions parse ion annotations, decode neutral-loss formulas, and apply charge validation (charge ≤ ion number) and phospho-loss filtering (drop ions matching phospho mass within tolerance). `_isoform_theo_mz` updated to return sorted gated m/z. Obsolete `_expected_fragment_mzs` helper removed.
Integration in Peak-Depth and Scoring Loop `onsite/phosphors/phosphors.py`	Peak-depth optimizer changes `add_metainfo` to `"true"` for annotation access. Scoring loop replaces per-isoform `MSSpectrum` generation and manual loss filtering with direct `_theo_mz_charge_valid` calls; skips isoforms yielding no fragments.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 Decimal floors with grace,
Ion charges face their place,
Phospho losses erased,
CompOmics parity embraced,
Hopping toward precision's face! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main changes: charge-validation of theoretical ions and fixing the floorDouble function to match Java behavior. It is concise, specific, and clearly conveys the primary fixes.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/phosphors-fragment-charge-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-06-09T17:16:52Z

Not up to standards ⛔

🔴 Issues 1 medium · 4 minor

Alerts:
⚠ 5 issues (≤ 0 issues of at least minor severity)

Results:
5 new issues

Category Results

Documentation 4 minor

Complexity 1 medium

View in Codacy

🟢 Metrics -6 complexity · -2 duplication

Metric Results

Complexity -6

Duplication -2

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

…iew doc - Remove _expected_fragment_mzs: dead since the D9 fix (no callers; its charge policy is now implemented by _theo_mz_charge_valid). MAX_ION_CHARGE is retained (still the default for the public max_ion_charge parameter). - Remove PHOSPHORS_PARITY_REVIEW.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Dead since the D9 fix (no callers; its charge policy is now implemented by _theo_mz_charge_valid). MAX_ION_CHARGE is retained (still the default for the public max_ion_charge parameter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-09T17:25:31Z

Algorithm Comparison Test Results

Click to expand test results

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /opt/hostedtoolcache/Python/3.11.15/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/onsite/onsite
configfile: pyproject.toml
plugins: cov-7.1.0
collecting ... collected 3 items

tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_lucxor_comparison 
================================================================================
LucXor Comparison Results (q-value < 0.01)
================================================================================

STRICT (Local FLR < 0.01):
  New results: 848
  Reference results: 848
  Overlap: 848 (100.0%)
  Recall: 100.0% (new found 848/848 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Local FLR < 0.05):
  New results: 1064
  Reference results: 1064
  Overlap: 1064 (100.0%)
  Recall: 100.0% (new found 1064/1064 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Local FLR < 0.1):
  New results: 1081
  Reference results: 1081
  Overlap: 1081 (100.0%)
  Recall: 100.0% (new found 1081/1081 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_ascore_comparison 
================================================================================
AScore Comparison Results (q-value < 0.01)
================================================================================

STRICT (AScore >= 20):
  New results: 919
  Reference results: 919
  Overlap: 919 (100.0%)
  Recall: 100.0% (new found 919/919 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (AScore >= 15):
  New results: 1023
  Reference results: 1023
  Overlap: 1023 (100.0%)
  Recall: 100.0% (new found 1023/1023 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (AScore >= 3):
  New results: 1076
  Reference results: 1076
  Overlap: 1076 (100.0%)
  Recall: 100.0% (new found 1076/1076 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_phosphors_comparison 
================================================================================
PhosphoRS Comparison Results (q-value < 0.01)
================================================================================

STRICT (Site probability > 99%):
  New results: 1066
  Reference results: 983
  Overlap: 946 (96.2%)
  Recall: 96.2% (new found 946/983 reference sites)
  Gain rate: 11.3% (120 new-only sites)
  Lost sites: 37
  Count ratio: 1.08x

MODERATE (Site probability > 90%):
  New results: 1104
  Reference results: 1084
  Overlap: 1035 (95.5%)
  Recall: 95.5% (new found 1035/1084 reference sites)
  Gain rate: 6.2% (69 new-only sites)
  Lost sites: 49
  Count ratio: 1.02x

LENIENT (Site probability > 75%):
  New results: 1118
  Reference results: 1102
  Overlap: 1049 (95.2%)
  Recall: 95.2% (new found 1049/1102 reference sites)
  Gain rate: 6.2% (69 new-only sites)
  Lost sites: 53
  Count ratio: 1.01x
PASSED

============================== 3 passed in 43.77s ==============================

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 772-775: The code in _theo_mz_charge_valid currently swallows any
exception from spec.getStringDataArrays() and returns all mzs, bypassing charge
and neutral-loss gating; instead, detect if spec.getStringDataArrays() is
missing or if its first StringDataArray length does not match len(mzs) and fail
fast by raising a clear exception (e.g., ValueError) with a descriptive message;
replace the broad except Exception block that returns [float(m) for m in mzs]
with explicit validation of spec.getStringDataArrays()[0] and a raised error so
callers cannot silently skip the per-ion gating logic.

In `@PHOSPHORS_PARITY_REVIEW.md`:
- Around line 150-151: Update the reproduction commands to avoid hardcoded
machine-specific paths by replacing literal occurrences of /tmp/parity and
/home/sachsenb/Development/onsite with path-portable references (e.g., use
$TMPDIR or ${TMPDIR:-/tmp} for temporary artifact dirs and $HOME or relative
project paths for repo roots); ensure every command and example that mentions
/tmp/parity or /home/sachsenb/Development/onsite (and the repeated blocks around
the later section) uses these variables or a note to set an environment variable
(e.g., PARITY_DIR) so other contributors can run the steps without manual edits.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80be3734-a565-4b81-ad7a-88d845105aad

📥 Commits

Reviewing files that changed from the base of the PR and between 486d0bb and 1e08e7d.

📒 Files selected for processing (2)

PHOSPHORS_PARITY_REVIEW.md
onsite/phosphors/phosphors.py

timosachsenberg · 2026-06-09T17:26:35Z

Full-data before/after impact — `data/1.mzML`

Ran the PhosphoRS CLI on the full dataset, before (main) vs after (this branch), identical inputs and flags, comparing per-PSM localization. Output is deterministic (threaded == serial), so the diff isolates the fix.

Input: data/1.mzML (160 MB) + data/1_consensus_fdr_filter_pep.idXML (3697 hits)
Flags: --add-decoys --threads 16 (decoy-AA = Alanine, the project's FLR basis)
Scored PSMs (≥2 candidate sites, localizable): 1989 (the rest are trivial/non-phospho/unscored on both sides)
Runtime: ~14 s each

Localization call-flips (best-isomer phospho placement changed)

metric	count
flips	194 / 1989 (9.75 %)
→ changed which residue is localized	194
→ gained a decoy(A) win (target→A)	8
→ lost a decoy(A) win (A→target)	50

Net −42 decoy wins: removing the physically-impossible high-charge ions (and the floor fix) stops noise that was spuriously supporting Alanine-decoy isoforms.

Decoy-AA FLR signal (best isomer puts a phospho on A)

metric	before	after	Δ
decoy-win PSMs	227	185	−42 (−18.5 %)
decoy-win PSMs (conf ≥95 %)	180	155	−25 (−13.9 %)
decoy placements `D`	233	191	−42
target placements `T`	1925	1967	+42
global decoy-AA FLR (Eq.2)	44.20 %	35.46 %	−8.74 pp

FLR = 2·(T_c/X_c)·(D/T), with T_c(STY)=4245, X_c(A)=2325. This is the unthresholded global estimator (no score filter), so the absolute value is high — the shift is the signal; the fix lowers FLR by ~20 % relative.

Confidence (max PhosphoRS site probability)

metric	before	after
confident calls (≥95 %)	1613	1621 (+8)
median max-site-prob	99.988 %	99.997 %

Among confident calls the decoy fraction drops 11.2 % → 9.6 %.

Takeaway

The fix changes ~10 % of localizations and reduces spurious Alanine-decoy wins by ~18 % (decoy-AA FLR 44.2 % → 35.5 %) while slightly increasing the number and confidence of target calls — the expected, beneficial effect of restoring compomics' chargeValidated fragment set.

Note: the stored PhosphoRS score is the raw binomial big_p, which underflows to ~0 for confident calls, so max-site-probability is used as the confidence axis instead.

🤖 Generated with Claude Code

The `-HPO3`/`-PO3H` name filter never matched pyOpenMS's loss annotation (`-H3O4P1`), so it dropped nothing. Replace it with the compomics PhosphoRS.java rule: drop a neutral loss only when its mass equals the modification mass (HPO3, 79.966 Da) -- such a fragment is mass-identical to the unmodified ion, hence not site-determining. H2O/H3PO4 losses are kept, so the ion set is unchanged (64 ions for PEPS(Phospho)TIDE @3+) but now robust to pyOpenMS's real `-HO3P1` spelling. _floor_double: coerce numpy scalars to Python float -- repr(np.float64(x)) is 'np.float64(x)' on numpy >=2.0, which Decimal() cannot parse. Memoize the per-ion charge/loss gate by annotation string (a pure function of the annotation): ~8x faster than the prior per-ion regex parse, making _theo_mz_charge_valid ~2.6x faster end-to-end. All 178 tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

onsite/phosphors/phosphors.py (2)

67-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Check math.isfinite() before the integer-floor fast path.

_floor_double(float("inf"), 0) and _floor_double(float("nan"), 0) hit math.floor() first and raise, so the new non-finite passthrough never applies for n_decimals <= 0.

Suggested fix

-    if n_decimals <= 0:
-        return float(math.floor(value))
     if not math.isfinite(value):
         return value
+    if n_decimals <= 0:
+        return float(math.floor(value))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/phosphors.py` around lines 67 - 70, The early return order
is wrong in _floor_double: check math.isfinite(value) before taking the
integer-floor fast path so non-finite values (inf, nan) are returned untouched;
change the branch order in the function (or around the snippet) to first do "if
not math.isfinite(value): return value" and only then handle "if n_decimals <=
0: return float(math.floor(value))".

762-765: ⚠️ Potential issue | 🟠 Major

Don’t treat spectrum-generation failures as “no ions”

onsite/phosphors/phosphors.py’s _theo_mz_charge_valid() swallows any exception from spec_gen.getSpectrum(...) and returns []; the scoring loop then does if not theo_mz: continue, so the affected isoform is silently omitted from isomer_scores, changing the subsequent probability normalization rather than failing the PSM.

Suggested fix

-    try:
-        spec_gen.getSpectrum(spec, seq, 1, max_z)
-    except Exception:
-        return []
+    spec_gen.getSpectrum(spec, seq, 1, max_z)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/phosphors.py` around lines 762 - 765, The current except in
_theo_mz_charge_valid around spec_gen.getSpectrum(spec, seq, 1, max_z) swallows
all exceptions and returns [], causing downstream code (if not theo_mz:
continue) to silently drop isoforms; instead, catch the exception, log the error
with context (including spec, seq, max_z) and re-raise the exception so the PSM
fails (or return an explicit failure sentinel that the caller checks), i.e.,
update the except block in _theo_mz_charge_valid to not return an empty list but
either re-raise the original exception after logging or return a clearly handled
sentinel and update the scoring loop that builds isomer_scores to treat that
sentinel as a fatal error rather than "no ions".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 747-760: The charge cap passed as max_ion_charge is not applied in
the charge-validated generator: modify _theo_mz_charge_valid to accept a
max_ion_charge parameter and compute max_z as min(max(1, int(precursor_charge) -
1), max_ion_charge) (or enforce the cap equivalently) so fragment generation
never exceeds the caller cap; propagate this new parameter from
calculate_phospho_localization_compomics_style through _isoform_theo_mz and into
any final scoring path that calls _theo_mz_charge_valid so n_expected and
binomial scoring use the same capped set of theoretical ions (alternatively
remove/deprecate max_ion_charge and update callers to reflect that change).

---

Outside diff comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 67-70: The early return order is wrong in _floor_double: check
math.isfinite(value) before taking the integer-floor fast path so non-finite
values (inf, nan) are returned untouched; change the branch order in the
function (or around the snippet) to first do "if not math.isfinite(value):
return value" and only then handle "if n_decimals <= 0: return
float(math.floor(value))".
- Around line 762-765: The current except in _theo_mz_charge_valid around
spec_gen.getSpectrum(spec, seq, 1, max_z) swallows all exceptions and returns
[], causing downstream code (if not theo_mz: continue) to silently drop
isoforms; instead, catch the exception, log the error with context (including
spec, seq, max_z) and re-raise the exception so the PSM fails (or return an
explicit failure sentinel that the caller checks), i.e., update the except block
in _theo_mz_charge_valid to not return an empty list but either re-raise the
original exception after logging or return a clearly handled sentinel and update
the scoring loop that builds isomer_scores to treat that sentinel as a fatal
error rather than "no ions".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a133b7fb-e68c-4910-acd9-f3e6da997d56

📥 Commits

Reviewing files that changed from the base of the PR and between 1e08e7d and 3000c27.

📒 Files selected for processing (1)

onsite/phosphors/phosphors.py

coderabbitai · 2026-06-09T19:20:49Z

+def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list:
+    """Charge-validated b/y theoretical fragment m/z for one (modified) peptide.
+
+    Replicates compomics PhosphoRS chargeValidated for PEPTIDE_FRAGMENT_ION:
+        fragment charge in 1 .. max(1, precursor_charge - 1)   (charge < precursor)
+        and charge <= ion number                               (a y1 cannot be 2+)
+    and drops any neutral loss whose mass equals the phospho modification mass
+    (HPO3, 79.966 Da), mirroring PhosphoRS.java -- such a fragment is mass-
+    identical to the unmodified ion, so it cannot localize the site (H3PO4 and
+    H2O losses are kept). The charge upper bound is enforced at generation; the
+    charge<=ion-number gate and the loss filter are read from the ion
+    annotations, so ``spec_gen`` MUST have ``add_metainfo='true'``. Returns m/z
+    in generator order (caller sorts if needed)."""
+    max_z = max(1, int(precursor_charge) - 1)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

max_ion_charge is no longer honored by the validated path.

calculate_phospho_localization_compomics_style() still exposes max_ion_charge, but _theo_mz_charge_valid() now always generates through precursor_charge - 1. Callers that cap fragment charge below the precursor will get extra theoretical ions, which changes n_expected and the binomial score. Please thread that cap through both _isoform_theo_mz() and the final scoring path, or remove/deprecate the parameter explicitly.

Suggested fix

-def _theo_mz_charge_valid(spec_gen, seq, precursor_charge) -> list: +def _theo_mz_charge_valid( + spec_gen, seq, precursor_charge, max_ion_charge=None +) -> list: @@ - max_z = max(1, int(precursor_charge) - 1) + max_z = max(1, int(precursor_charge) - 1) + if max_ion_charge is not None: + max_z = min(max_z, int(max_ion_charge))

-def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge): - return sorted(_theo_mz_charge_valid(spec_gen, seq_profile, precursor_charge)) +def _isoform_theo_mz(spec_gen, seq_profile, precursor_charge, max_ion_charge=None): + return sorted( + _theo_mz_charge_valid( + spec_gen, seq_profile, precursor_charge, max_ion_charge + ) + )

Also applies to: 783-787, 897-900, 1197-1198

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@onsite/phosphors/phosphors.py` around lines 747 - 760, The charge cap passed as max_ion_charge is not applied in the charge-validated generator: modify _theo_mz_charge_valid to accept a max_ion_charge parameter and compute max_z as min(max(1, int(precursor_charge) - 1), max_ion_charge) (or enforce the cap equivalently) so fragment generation never exceeds the caller cap; propagate this new parameter from calculate_phospho_localization_compomics_style through _isoform_theo_mz and into any final scoring path that calls _theo_mz_charge_valid so n_expected and binomial scoring use the same capped set of theoretical ions (alternatively remove/deprecate max_ion_charge and update callers to reflect that change).

max_ion_charge (and its MAX_ION_CHARGE=2 backing constant) was never read. Fragment charge is governed by compomics chargeValidated (charge < precursor, i.e. max_z = precursor_charge - 1), not a fixed cap -- applying the cap would drop legitimate charge-3+ fragments for >=4+ precursors and diverge from the reference, so the parameter is removed rather than wired in. No caller passes it and the CLIs do not expose it; algorithm-comparison results are unchanged (PhosphoRS 1066/1104/1118, LucXor/AScore identical). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Like max_ion_charge, add_ion_types (and its ADD_ION_TYPES constant) was never read -- the theoretical-spectrum generators hardcode b/y ions. No caller passes it and the CLIs do not expose it; algorithm-comparison results are unchanged (PhosphoRS 1066/1104/1118, LucXor/AScore identical). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

timosachsenberg and others added 2 commits June 9, 2026 19:18

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread onsite/phosphors/phosphors.py

Comment thread PHOSPHORS_PARITY_REVIEW.md Outdated

timosachsenberg requested review from weizhongchun and ypriverol June 9, 2026 17:28

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

timosachsenberg and others added 2 commits June 9, 2026 21:56

ypriverol approved these changes Jun 10, 2026

View reviewed changes

ypriverol merged commit 9b8c531 into main Jun 10, 2026
2 of 3 checks passed

coderabbitai Bot mentioned this pull request Jun 20, 2026

Format-agnostic idXML/mzIdentML I/O + localizer performance optimizations #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(phosphors): charge-validate theoretical ions + faithful floorDouble#45

fix(phosphors): charge-validate theoretical ions + faithful floorDouble#45
ypriverol merged 6 commits into
mainfrom
fix/phosphors-fragment-charge-validation

timosachsenberg commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

qodo-code-review Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Uh oh!

codacy-production Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

timosachsenberg commented Jun 9, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timosachsenberg commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

D9 — fragment-charge over-generation (the impactful one)

D13c — _floor_double was a binary floor, not a decimal floor

Not changed (deliberately)

Testing

Summary by CodeRabbit

Uh oh!

qodo-code-review Bot commented Jun 9, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Uh oh!

codacy-production Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Algorithm Comparison Test Results

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timosachsenberg commented Jun 9, 2026

Full-data before/after impact — data/1.mzML

Localization call-flips (best-isomer phospho placement changed)

Decoy-AA FLR signal (best isomer puts a phospho on A)

Confidence (max PhosphoRS site probability)

Takeaway

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timosachsenberg commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

D13c — `_floor_double` was a binary floor, not a decimal floor

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

codacy-production Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Full-data before/after impact — `data/1.mzML`