Separate annotation from scoring and restructure codebase#10
Merged
Conversation
Group files by responsibility: types/ for Python-facing data structures, io/ for file parsing and format dispatch, scoring/ for feature computation, ms2pip/ for future ms2pip-specific functionality. No logic changes.
Peak-centric annotation types that mirror rustyms output using only plain Rust/Python types. AnnotatedMS2Spectrum carries the original spectrum data alongside per-peak fragment annotations.
Extract annotation logic into standalone pyfunction that produces AnnotatedMS2Spectrum output. Supports all 6 ion series (a/b/c/x/y/z) and exposes tolerance_value + tolerance_mode parameters.
…ctra Replace monolithic function with score_ms2_spectra() that consumes AnnotatedMS2Spectrum. Fixed feature set for all 6 ion series with NaN for inactive series. Extract shared math helpers to utils.rs.
Add 16 Python tests for annotate_ms2_spectra and score_ms2_spectra, plus Rust unit tests in utils.rs. Fix empty spectrum panic, type mismatches, clippy warnings, and PyO3 deprecation warnings.
…iciency - Remove duplicate f32/f64 arrays in OwnedSpec, convert inline - Cache parsed peptides to avoid double-parsing - Extract empty_annotated helper for repeated return blocks - Replace per-series HashMaps with fixed [_; 6] arrays - Pre-compute feature name strings outside parallel loop - Keep intensity accumulation in f32, cast to f64 at output boundary - Use byte parsing in parse_ion_series_and_index to avoid heap allocs - Fix stale comment in spectrum_prediction.rs, powf -> exp2
score_ms2_spectra now takes explicit active_ion_series so the caller specifies which series the fragmentation model produces, rather than inferring from matched annotations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Decouples spectrum annotation from feature computation so annotations can be performed once and reused by multiple feature generators. Also restructures the codebase into logical modules and adds support for all 6 primary ion series.
Added
FragmentAnnotationandAnnotatedMS2Spectrumtypes — peak-centric annotation representation using only plain Python types (no dependency types exposed)annotate_ms2_spectra()function — standalone annotation step with configurabletolerance_valueandtolerance_mode(ppm or Da)score_ms2_spectra()function — computes scoring features from annotated spectra with explicitactive_ion_seriesparameterms2pip/module — placeholder for future ms2pip-specific functionalityChanged
types/,io/,scoring/,ms2pip/moduleslib.rsnow only contains module declarations and pymodule registrationmatched_ions_pctdenominator now uses all active series instead of hardcoded 2Removed
ms2_features_from_ms2spectra()— replaced by the two-stepannotate_ms2_spectra()→score_ms2_spectra()flow