Speed up annotation and theoretical m/z calculation for MS²PIP#14
Merged
Conversation
Move theoretical fragment generation into py.detach() and use rayon par_iter over unique peptide+charge keys, so both cache building and spectrum annotation run outside the GIL in parallel.
Convert the f64 results to f32 numpy arrays before returning to Python, eliminating the need for np.array() wrapping on the caller side.
Derive sequence length internally by parsing the proforma string, which the function already does later for fragment generation.
Extract per-ion-type observed intensity arrays from annotated spectra. Derives sequence length from annotation positions, fills unmatched positions with log2(0.001), takes max when multiple peaks match.
- Fix targets: use explicit seq_lens parameter instead of fragile max-position derivation; use HashSet for ion_types lookup; filter out-of-bounds in GIL phase - Fix annotation: eliminate double proforma parsing by deriving seq_len from cache entry in parallel block; restructure CacheEntry as named struct - Fix theoretical_mz: update docstring to reflect numpy return type
annotate_ms2_spectra now skips rustyms spectrum materialization and builds peak annotations directly from cached fragment metadata, reducing native overhead in the correlate workflow. ms2pip_compute_theoretical_mz now pre-parses requested ion types, fills f32 output buffers directly, and reuses shared theoretical-fragment generation helpers. Also adds shared fragment/mz-search utilities and tests in utils.rs, and includes the current uv.lock regeneration.`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added
ms2pip_extract_targets, a Rust-backed API for extracting per-ion observed intensity arrays fromAnnotatedMS2Spectrumobjects.Changed
0.5.0-alpha.3.annotate_ms2_spectrato derive sequence length internally from the ProForma input and removed the need to passseq_lens.annotate_ms2_spectraso cache building and annotation both run outside the GIL.annotate_ms2_spectrato build peak annotations directly from cached fragment metadata instead of materializing intermediaterustymsspectrum types.ms2pip_compute_theoretical_mzto return NumPyfloat32arrays directly instead of RustVec<f64>output that must be wrapped on the Python side.f32output buffers directly.uv.lockto match the current Python support matrix and dependency resolution.Fixed
seq_lensinstead of deriving sequence length from matched annotations.HashSetfor ion-type membership checks.seq_lenfrom cached entries.rustymsspectrum materialization and intermediate allocations.Removed
seq_lensparameter fromannotate_ms2_spectra.