Speed up annotation and theoretical m/z calculation for MS²PIP by RalfG · Pull Request #14 · CompOmics/ms2rescore-rs

RalfG · 2026-04-15T15:16:48Z

Added

Added ms2pip_extract_targets, a Rust-backed API for extracting per-ion observed intensity arrays from AnnotatedMS2Spectrum objects.
Added shared utilities for cached theoretical fragment generation and nearest-peak m/z matching.
Registered the new ms2pip targets module/function in the Python extension API.

Changed

Bumped the crate version to 0.5.0-alpha.3.
Updated annotate_ms2_spectra to derive sequence length internally from the ProForma input and removed the need to pass seq_lens.
Parallelized fragment-cache construction in annotate_ms2_spectra so cache building and annotation both run outside the GIL.
Reworked annotate_ms2_spectra to build peak annotations directly from cached fragment metadata instead of materializing intermediate rustyms spectrum types.
Updated ms2pip_compute_theoretical_mz to return NumPy float32 arrays directly instead of Rust Vec<f64> output that must be wrapped on the Python side.
Simplified theoretical m/z generation by pre-parsing requested ion types and filling indexed f32 output buffers directly.
Refreshed uv.lock to match the current Python support matrix and dependency resolution.

Fixed

Fixed target extraction to use explicit seq_lens instead of deriving sequence length from matched annotations.
Fixed target extraction bounds handling by filtering out-of-range annotation positions during GIL-side data extraction.
Improved target extraction lookup efficiency by using a HashSet for ion-type membership checks.
Eliminated redundant ProForma parsing in the annotation pipeline by deriving seq_len from cached entries.
Corrected the theoretical m/z docstring to reflect that NumPy arrays are returned.
Reduced native overhead in the correlate workflow by avoiding unnecessary rustyms spectrum materialization and intermediate allocations.

Removed

Removed the seq_lens parameter from annotate_ms2_spectra.

Move theoretical fragment generation into py.detach() and use rayon par_iter over unique peptide+charge keys, so both cache building and spectrum annotation run outside the GIL in parallel.

Convert the f64 results to f32 numpy arrays before returning to Python, eliminating the need for np.array() wrapping on the caller side.

Derive sequence length internally by parsing the proforma string, which the function already does later for fragment generation.

Extract per-ion-type observed intensity arrays from annotated spectra. Derives sequence length from annotation positions, fills unmatched positions with log2(0.001), takes max when multiple peaks match.

- Fix targets: use explicit seq_lens parameter instead of fragile max-position derivation; use HashSet for ion_types lookup; filter out-of-bounds in GIL phase - Fix annotation: eliminate double proforma parsing by deriving seq_len from cache entry in parallel block; restructure CacheEntry as named struct - Fix theoretical_mz: update docstring to reflect numpy return type

annotate_ms2_spectra now skips rustyms spectrum materialization and builds peak annotations directly from cached fragment metadata, reducing native overhead in the correlate workflow. ms2pip_compute_theoretical_mz now pre-parses requested ion types, fills f32 output buffers directly, and reuses shared theoretical-fragment generation helpers. Also adds shared fragment/mz-search utilities and tests in utils.rs, and includes the current uv.lock regeneration.`

RalfG added 8 commits April 15, 2026 15:24

Parallelize fragment cache building in annotate_ms2_spectra

019d037

Move theoretical fragment generation into py.detach() and use rayon par_iter over unique peptide+charge keys, so both cache building and spectrum annotation run outside the GIL in parallel.

Return numpy float32 arrays from ms2pip_compute_theoretical_mz

7d78f72

Convert the f64 results to f32 numpy arrays before returning to Python, eliminating the need for np.array() wrapping on the caller side.

Remove seq_lens parameter from annotate_ms2_spectra

836ce45

Derive sequence length internally by parsing the proforma string, which the function already does later for fragment generation.

Add ms2pip_extract_targets function

8406734

Extract per-ion-type observed intensity arrays from annotated spectra. Derives sequence length from annotation positions, fills unmatched positions with log2(0.001), takes max when multiple peaks match.

Version bump

e1127a4

Update Python tests for annotate_ms2_spectra API change

4829ea4

RalfG merged commit 95d01d2 into release/0.5 Apr 15, 2026
4 checks passed

RalfG deleted the feat/ms2pip-improvements branch April 15, 2026 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up annotation and theoretical m/z calculation for MS²PIP#14

Speed up annotation and theoretical m/z calculation for MS²PIP#14
RalfG merged 8 commits into
release/0.5from
feat/ms2pip-improvements

RalfG commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RalfG commented Apr 15, 2026

Added

Changed

Fixed

Removed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant