Skip to content

Commit 486d0bb

Browse files
authored
Merge pull request #41 from bigbio/fix/decoy-flr-scoring-bugs
Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40)
2 parents eb7f3be + b47bce6 commit 486d0bb

18 files changed

Lines changed: 23850 additions & 18392 deletions

README.md

Lines changed: 45 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ python -m onsite.lucxor.cli -in spectra.mzML -id identifications.idXML -out resu
209209
| `--scoring-threshold` | 0.0 | Minimum LucXor score to report |
210210
| `--min-num-psms-model` | 50 | Minimum number of high-scoring PSMs required for modeling |
211211
| `--threads` | 1 | Number of threads for parallel processing |
212+
| `--seed` | 42 | RNG seed for reproducible decoy permutations / model subsampling (deterministic for the default single-threaded run) |
212213
| `--rt-tolerance` | 0.01 | RT tolerance used when matching spectra by retention time |
213214
| `--disable-split-by-charge` | False | Disable splitting PSMs by charge state for model training |
214215
| `--compute-all-scores` | False | Run all three algorithms and merge results |
@@ -222,28 +223,62 @@ The AScore algorithm provides phosphorylation site localization by analyzing MS/
222223

223224
**Output Metrics:**
224225

225-
- `AScore_pep_score`: Overall peptide score
226-
- `AScore_1, AScore_2, ...`: Individual site scores
227-
- `ProForma`: Standardized sequence notation with confidence scores
226+
- Hit score: the best per-site AScore (**higher = more confident**).
227+
- `AScore_pep_score`: overall peptide-level AScore.
228+
- `AScore_site_scores`: `{position: AScore}` dict, one entry per candidate site (0-based positions).
229+
- `AScore_1, AScore_2, ...`: per-rank individual site scores.
230+
- `ProForma`: standardized sequence notation with confidence scores.
231+
- *Typical threshold:* **AScore ≥ 13** (~99% site-level confidence; Beausoleil et al. 2006).
228232

229233
### PhosphoRS Algorithm
230234

231235
The PhosphoRS algorithm implements a comprehensive approach using isomer generation, theoretical spectrum matching, and probability scoring for confident phosphorylation site assignment.
232236

233237
**Output Metrics:**
234-
- Site-specific probability scores (0-100%)
235-
- Isomer details with sequence and score
236-
- Detailed confidence metrics
238+
- `PhosphoRS_site_probs`: `{position: probability}` on a **0–100% scale** (**higher = more confident**) — the classic phosphoRS site probability.
239+
- `PhosphoRS_site_delta`: `{position: Δ}` — the `−10·log10 P` gap between the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, unlike the probability, it does not saturate at 100%.
240+
- `PhosphoRS_pep_score`: peptide-level binomial probability *P* (**lower = more confident**).
241+
- `regular_phospho_count` / `phospho_decoy_count`: number of phospho / decoy sites placed.
242+
- *Typical threshold:* **site probability ≥ 75%** (or 90 / 99% for stricter sets).
237243

238244
### LucXor (LuciPHOr2) Algorithm
239245

240246
LucXor implements the complete LuciPHOr2 algorithm with two-stage processing for accurate PTM localization with false localization rate (FLR) estimation.
241247

242248
**Output Metrics:**
243-
- `Luciphor_delta_score`: Main localization score
244-
- `Luciphor_pep_score`: Peptide identification score
245-
- `Luciphor_global_flr`: Global false localization rate
246-
- `Luciphor_local_flr`: Local false localization rate
249+
- `Luciphor_delta_score`: main localization score (the hit score type; **higher = more confident**).
250+
- `Luciphor_pep_score`: per-PSM delta score.
251+
- `Luciphor_global_flr` / `Luciphor_local_flr`: LucXor's **native false-localization-rate** estimates per PSM (**lower = more confident**) — the only tool that emits an FLR directly.
252+
- `Luciphor_site_scores`: `{position: Δ}` per-site confidence derived from the permutation scores.
253+
- *Typical threshold:* **local FLR ≤ 0.05** (or global FLR ≤ 0.01).
254+
255+
## Interpreting the output: PSM-FDR vs localization FLR
256+
257+
These tools assume your input idXML is **already filtered at the PSM level** (e.g. 1% PSM-FDR). That FDR answers *"is the peptide identification correct?"* and is left untouched. Localization adds a **second, orthogonal** error axis: *"is the PTM on the right residue?"* — the **false localization rate (FLR)**. A confident identification can still carry an ambiguous site, so the two rates are independent and you typically control both.
258+
259+
Running any tool on a 1%-PSM-FDR idXML re-localizes each hit to its best-scoring site and writes these scores:
260+
261+
| | primary score | per-site confidence | typical cutoff | native FLR? |
262+
|---|---|---|---|---|
263+
| **AScore** | hit score = best AScore (higher = better) | `AScore_site_scores` | AScore ≥ 13 | no |
264+
| **PhosphoRS** | `PhosphoRS_site_probs`, 0–100% (higher = better) | `PhosphoRS_site_probs` / `PhosphoRS_site_delta` | prob ≥ 75% | no |
265+
| **LucXor** | `Luciphor_delta_score` (higher = better) | `Luciphor_site_scores` | `Luciphor_local_flr` ≤ 0.05 | **yes** |
266+
267+
Positions in the per-site dicts are **0-based** indices into the unmodified peptide.
268+
269+
**Quickest single-tool answer:** run LucXor and keep `Luciphor_local_flr ≤ 0.05` — it is the only tool that reports an FLR out of the box.
270+
271+
### Unified decoy-amino-acid global FLR (compare tools, or get an FLR for AScore/PhosphoRS)
272+
273+
AScore and PhosphoRS report a per-site *confidence* but no global FLR. To put all three on one comparable FLR scale, run each tool with **`--add-decoys`** (adds Alanine as a `PhosphoDecoy`; A cannot be phosphorylated, so a localization onto A is a known false one), then:
274+
275+
```bash
276+
python -m onsite.decoy_flr \
277+
--ascore a.idXML --phosphors p.idXML --lucxor l.idXML \
278+
--q-value-threshold 0.01 --flr-threshold 0.05
279+
```
280+
281+
This reads the `target_decoy` and `q-value` UserParams from your FDR-filtered idXML, re-applies the q-value cutoff, intersects the PSM set across the supplied tools, and reports the sites recovered at your global FLR threshold (decoy-amino-acid method of Ramsbottom et al. 2022; **5%** is the recommended cutoff). `--add-decoys` is only needed for this FLR estimation — for plain localization you can omit it. Any subset of `--ascore` / `--phosphors` / `--lucxor` may be passed.
247282

248283
## Example Results
249284

0 commit comments

Comments
 (0)