Skip to content

Speed up annotation and theoretical m/z calculation for MS²PIP#14

Merged
RalfG merged 8 commits into
release/0.5from
feat/ms2pip-improvements
Apr 15, 2026
Merged

Speed up annotation and theoretical m/z calculation for MS²PIP#14
RalfG merged 8 commits into
release/0.5from
feat/ms2pip-improvements

Conversation

@RalfG
Copy link
Copy Markdown
Member

@RalfG RalfG commented Apr 15, 2026

Added

  • Added ms2pip_extract_targets, a Rust-backed API for extracting per-ion observed intensity arrays from AnnotatedMS2Spectrum objects.
  • Added shared utilities for cached theoretical fragment generation and nearest-peak m/z matching.
  • Registered the new ms2pip targets module/function in the Python extension API.

Changed

  • Bumped the crate version to 0.5.0-alpha.3.
  • Updated annotate_ms2_spectra to derive sequence length internally from the ProForma input and removed the need to pass seq_lens.
  • Parallelized fragment-cache construction in annotate_ms2_spectra so cache building and annotation both run outside the GIL.
  • Reworked annotate_ms2_spectra to build peak annotations directly from cached fragment metadata instead of materializing intermediate rustyms spectrum types.
  • Updated ms2pip_compute_theoretical_mz to return NumPy float32 arrays directly instead of Rust Vec<f64> output that must be wrapped on the Python side.
  • Simplified theoretical m/z generation by pre-parsing requested ion types and filling indexed f32 output buffers directly.
  • Refreshed uv.lock to match the current Python support matrix and dependency resolution.

Fixed

  • Fixed target extraction to use explicit seq_lens instead of deriving sequence length from matched annotations.
  • Fixed target extraction bounds handling by filtering out-of-range annotation positions during GIL-side data extraction.
  • Improved target extraction lookup efficiency by using a HashSet for ion-type membership checks.
  • Eliminated redundant ProForma parsing in the annotation pipeline by deriving seq_len from cached entries.
  • Corrected the theoretical m/z docstring to reflect that NumPy arrays are returned.
  • Reduced native overhead in the correlate workflow by avoiding unnecessary rustyms spectrum materialization and intermediate allocations.

Removed

  • Removed the seq_lens parameter from annotate_ms2_spectra.

RalfG added 8 commits April 15, 2026 15:24
Move theoretical fragment generation into py.detach() and use
rayon par_iter over unique peptide+charge keys, so both cache
building and spectrum annotation run outside the GIL in parallel.
Convert the f64 results to f32 numpy arrays before returning to
Python, eliminating the need for np.array() wrapping on the caller side.
Derive sequence length internally by parsing the proforma string,
which the function already does later for fragment generation.
Extract per-ion-type observed intensity arrays from annotated spectra.
Derives sequence length from annotation positions, fills unmatched
positions with log2(0.001), takes max when multiple peaks match.
- Fix targets: use explicit seq_lens parameter instead of fragile max-position
  derivation; use HashSet for ion_types lookup; filter out-of-bounds in GIL phase
- Fix annotation: eliminate double proforma parsing by deriving seq_len from
  cache entry in parallel block; restructure CacheEntry as named struct
- Fix theoretical_mz: update docstring to reflect numpy return type
annotate_ms2_spectra now skips rustyms spectrum materialization and builds peak annotations directly from cached fragment metadata, reducing native overhead in the correlate workflow. ms2pip_compute_theoretical_mz now pre-parses requested ion types, fills f32 output buffers directly, and reuses shared theoretical-fragment generation helpers.

Also adds shared fragment/mz-search utilities and tests in utils.rs, and includes the current uv.lock regeneration.`
@RalfG RalfG merged commit 95d01d2 into release/0.5 Apr 15, 2026
4 checks passed
@RalfG RalfG deleted the feat/ms2pip-improvements branch April 15, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant