Feat spectral angle#185
Conversation
c96c4e8 to
73bd392
Compare
40e590e to
8842a46
Compare
73bd392 to
eb04976
Compare
8842a46 to
5107498
Compare
f711c8e to
b198956
Compare
cfd38fe to
17231f4
Compare
BioGeek
left a comment
There was a problem hiding this comment.
Mostly about handling backwards compatability issues gracefully. Rest looks good.
| "complementary_ion_count", | ||
| "max_ion_gap", | ||
| "b_y_intensity_ratio", | ||
| "spectral_angle", |
There was a problem hiding this comment.
When users load a calibrator trained before this change (including the default Hugging Face model unless it is republished in lockstep), its pickled FragmentMatchFeatures instance now reports one extra column while the stored scaler/classifier were fitted without it. ProbabilityCalibrator.predict() then builds a wider feature matrix and StandardScaler.transform raises a feature-count error. Please raise a warning that older calibrators are no longer supported.
See test_fragment_match_columns_preserve_saved_calibrator_width
There was a problem hiding this comment.
I've adapted this test to be more generic to catch any feature matrix dimension shift on load and predict, and added a clear error message when this happens
| source_annotations: Union[List[bytes], List[str]], | ||
| mz_tolerance: float = 0.02, | ||
| ) -> Tuple[float, float, List[str], List[float], List[float]]: | ||
| ) -> Tuple[float, float, List[str], List[float], List[float], List[float]]: |
There was a problem hiding this comment.
For callers that import this helper directly (it is exported via winnow.calibration.features and re-exported through the backwards-compatible calibration_features module), adding a sixth tuple item breaks existing five-value unpacking even though the first five values still mean the same thing. Please raise a warning that older calibrators are no longer supported.
See test_find_matching_ions_preserves_public_return_arity (needs to be adapted to catch the warning).
| source_column: str, | ||
| source_mz_column: str, | ||
| source_annotation_column: str, | ||
| source_intensity_column: str, |
There was a problem hiding this comment.
Direct callers of this re-exported helper that still use the previous API (source_column=..., no intensity column, and seven returned iterables) now fail with TypeError or unpacking errors. Please raise a warning that old format is no longer supported.
See test_compute_ion_identifications_preserves_public_api_without_spectral_angle (adapt it to catch warning)
There was a problem hiding this comment.
I've addressed this with hard failures, instead of warnings, because the old format is not supported
| theoretical, with 0.0 for unmatched ions). | ||
|
|
||
| Returns: | ||
| Spectral angle in radians. 0 indicates perfect correlation, 1 indicates perfect anti-correlation. |
There was a problem hiding this comment.
Documentation/naming semantics issue:
The implementation:
return 1 - (2 * np.arccos(dot_product) / np.pi)computes a normalized spectral angle similarity score:
- perfect match: 1.0
- orthogonal vectors: 0.0
- opposite vectors: -1.0 (though fragment intensities should normally be non-negative, so this is not expected in practice)
This function is not returning radians, and it is not returning a distance where 0 is perfect. So I would change "spectral angle" everywhere to "normalized spectral angle similarity score" ande this line to:
| Spectral angle in radians. 0 indicates perfect correlation, 1 indicates perfect anti-correlation. | |
| Normalized spectral angle similarity. 1 indicates perfect agreement and 0 indicates orthogonal non-negative intensity vectors. |
There was a problem hiding this comment.
Thanks for this, should be clarified now. I chose to keep the column name as spectral_angle to be consistent with my other column naming conventions which do not explicitly mention normalisation in the name but describe it in the documentation.
chore: fix american spelling
9b7bf59 to
f5f80b4
Compare
Add spectral angle feature
Summary
Adds a
spectral_anglefeature toFragmentMatchFeatures(andchimeric_spectral_angletoChimericFeatures). The spectral angle measures the cosine similarity between the theoretical and observed fragmentation spectra, projected into the range [0, 1]. A value of 1 indicates a perfect match. This is a widely used metric in proteomics for assessing PSM quality and provides the calibrator with a complementary signal to the existing ion match rate and intensity features.How it works
find_matching_ionsnow builds an aligned M0 intensity vector — a list with one entry per theoretical ion, containing the observed M0 peak intensity if matched or 0.0 if unmatched.compute_spectral_anglefunction takes the theoretical intensity vector and the aligned observed intensity vector, L2-normalises both, computes their dot product, and converts the resulting angle to the [0, 1] scale via1 - (2 * arccos(dot) / π).compute_ion_identificationsand surfaced as a column in bothFragmentMatchFeaturesandChimericFeatures.Changes
winnow/calibration/features/utils.py:find_matching_ionsnow returns a sixth value: aligned M0 intensities for spectral angle computation. The function also tracks unmatched theoretical ions (appending 0.0) to ensure alignment.compute_ion_identificationsnow accepts asource_intensity_columnparameter and computes the spectral angle for each spectrum.compute_spectral_anglefunction implementing the normalised spectral contrast angle.winnow/calibration/features/fragment_match.py— addsspectral_angletocolumns()and passessource_intensity_columntocompute_ion_identifications.winnow/calibration/features/chimeric.py— addschimeric_spectral_anglecolumn; updatescompute_ion_identificationscall signature to includesource_mz_columnandsource_intensity_column.docs/api/features/fragment_match.md,docs/api/features/chimeric.md— document the new feature.tests/calibration/features/test_utils.py— comprehensive tests forcompute_spectral_anglecovering perfect match, orthogonal spectra, partial match, single-ion, empty input, and integration with the full pipeline.tests/calibration/features/test_fragment_match.py,test_chimeric.py— assert new columns are present.