Skip to content

Feat spectral angle#185

Merged
JemmaLDaniel merged 13 commits into
mainfrom
feat-spectral-angle
Jun 27, 2026
Merged

Feat spectral angle#185
JemmaLDaniel merged 13 commits into
mainfrom
feat-spectral-angle

Conversation

@JemmaLDaniel

Copy link
Copy Markdown
Collaborator

Add spectral angle feature

Summary

Adds a spectral_angle feature to FragmentMatchFeatures (and chimeric_spectral_angle to ChimericFeatures). The spectral angle measures the cosine similarity between the theoretical and observed fragmentation spectra, projected into the range [0, 1]. A value of 1 indicates a perfect match. This is a widely used metric in proteomics for assessing PSM quality and provides the calibrator with a complementary signal to the existing ion match rate and intensity features.

How it works

  1. During ion matching, find_matching_ions now builds an aligned M0 intensity vector — a list with one entry per theoretical ion, containing the observed M0 peak intensity if matched or 0.0 if unmatched.
  2. A new compute_spectral_angle function takes the theoretical intensity vector and the aligned observed intensity vector, L2-normalises both, computes their dot product, and converts the resulting angle to the [0, 1] scale via 1 - (2 * arccos(dot) / π).
  3. The spectral angle is computed per-spectrum in compute_ion_identifications and surfaced as a column in both FragmentMatchFeatures and ChimericFeatures.

Changes

  • winnow/calibration/features/utils.py:
    • find_matching_ions now returns a sixth value: aligned M0 intensities for spectral angle computation. The function also tracks unmatched theoretical ions (appending 0.0) to ensure alignment.
    • compute_ion_identifications now accepts a source_intensity_column parameter and computes the spectral angle for each spectrum.
    • Adds compute_spectral_angle function implementing the normalised spectral contrast angle.
  • winnow/calibration/features/fragment_match.py — adds spectral_angle to columns() and passes source_intensity_column to compute_ion_identifications.
  • winnow/calibration/features/chimeric.py — adds chimeric_spectral_angle column; updates compute_ion_identifications call signature to include source_mz_column and source_intensity_column.
  • docs/api/features/fragment_match.md, docs/api/features/chimeric.md — document the new feature.
  • tests/calibration/features/test_utils.py — comprehensive tests for compute_spectral_angle covering perfect match, orthogonal spectra, partial match, single-ion, empty input, and integration with the full pipeline.
  • tests/calibration/features/test_fragment_match.py, test_chimeric.py — assert new columns are present.

@JemmaLDaniel JemmaLDaniel self-assigned this Apr 10, 2026
@JemmaLDaniel JemmaLDaniel added the enhancement New feature or request label Apr 10, 2026
@github-actions

github-actions Bot commented Apr 10, 2026

Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py00100% 
data_types.py40100% 
calibration
   __init__.py00100% 
   calibration_features.py90100% 
   calibrator.py1021189%69–70, 72, 107, 134–135, 137, 163, 168, 195–196
   diagnostics.py1685070%70, 96, 101, 111, 115, 137, 146, 203–218, 261–262, 266, 307, 309–324, 335–341
calibration/features
   __init__.py100100% 
   base.py80100% 
   beam.py470100% 
   chimeric.py81198%212
   constants.py40100% 
   fragment_match.py77198%202
   mass_error.py67297%16, 20
   retention_time.py135993%183, 190, 206, 257–259, 269, 272–273
   sequence.py190100% 
   token_score.py37197%82
   utils.py216398%89, 361, 580
compat
   __init__.py00100% 
   instanovo.py10640%12, 14–15, 17, 24–25
datasets
   __init__.py00100% 
   calibration_dataset.py1091784%155, 169, 171, 173, 183, 196, 249, 251–252, 258–261, 263–266
   interfaces.py30100% 
   psm_dataset.py250100% 
datasets/data_loaders
   __init__.py50100% 
   instanovo.py1191984%90, 93, 119, 142, 168–169, 172–174, 176–177, 179, 182–183, 185, 343–345, 356
   mztab.py2155574%103, 106, 157, 161, 210–211, 223, 236–240, 287, 290, 302–303, 315–317, 319–320, 322, 324, 330, 334–336, 338–339, 343–346, 350, 514–515, 518, 521, 528, 542–546, 550–555, 561, 570–571, 573, 599
   pointnovo.py70100% 
   utils.py59198%11
   winnow.py39489%54–55, 91–92
fdr
   __init__.py00100% 
   base.py581574%81, 85–86, 91, 98–99, 105, 126, 129–130, 135, 137–138, 144, 186
   database_grounded.py28196%52
   nonparametric.py25484%62, 68–69, 72
scripts
   __init__.py00100% 
   main.py2562560%8, 10–13, 16–20, 23–24, 26–28, 32, 39, 44, 47, 53, 55–56, 59, 68, 76, 79, 86, 88–90, 92, 94–99, 102, 104–105, 110, 125, 128, 135–141, 144–145, 148, 161–163, 166, 169, 174, 176–178, 180, 182–183, 186–187, 190, 192–193, 195, 197, 199–200, 202, 205–206, 209–210, 213–214, 217–219, 221–224, 227–229, 231, 234, 248–250, 252, 254, 259, 261–263, 265–266, 268, 270–271, 273–275, 277, 279, 281–282, 286–289, 291–292, 294–295, 297–298, 300, 303, 317–319, 322, 325, 330, 332–334, 336–338, 340–341, 344–345, 348, 350–351, 353, 355, 357–358, 360, 363–364, 370–372, 374–377, 380–381, 384–385, 388–389, 392–393, 401–403, 407, 410, 414, 417, 423–425, 427–428, 435–436, 438, 440, 445, 447–449, 451–452, 455, 457–458, 460–463, 465–466, 468–469, 471–473, 479–480, 484–485, 488, 495, 500–501, 506–508, 511, 516, 526, 533, 535, 539, 541–542, 546–547, 550, 573, 586–587, 590, 612, 624–625, 628, 653, 666–667, 670, 685, 697–698, 701, 716, 728–729, 732, 744, 756–757, 760, 775, 787–788, 791, 800, 812–813
utils
   __init__.py40100% 
   config_formatter.py534024%29, 37–38, 40–42, 44, 55, 58–60, 62–63, 66–69, 72–74, 77–78, 80, 91, 102, 113, 127–128, 130–132, 145–147, 150, 153–154, 157–158, 160
   config_path.py76593%24–26, 117–118
   peptide.py160100% 
TOTAL209150176% 

Tests Skipped Failures Errors Time
420 0 💤 0 ❌ 0 🔥 37.671s ⏱️

@JemmaLDaniel JemmaLDaniel force-pushed the feat-b-y-intensity-ratio branch from c96c4e8 to 73bd392 Compare April 10, 2026 14:37
@JemmaLDaniel JemmaLDaniel force-pushed the feat-b-y-intensity-ratio branch from 73bd392 to eb04976 Compare April 10, 2026 16:41
@JemmaLDaniel JemmaLDaniel linked an issue Apr 23, 2026 that may be closed by this pull request
@JemmaLDaniel JemmaLDaniel force-pushed the feat-spectral-angle branch from f711c8e to b198956 Compare June 22, 2026 14:16
@JemmaLDaniel JemmaLDaniel force-pushed the feat-spectral-angle branch from cfd38fe to 17231f4 Compare June 22, 2026 14:29
@JemmaLDaniel JemmaLDaniel requested a review from BioGeek June 22, 2026 14:32

@BioGeek BioGeek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly about handling backwards compatability issues gracefully. Rest looks good.

"complementary_ion_count",
"max_ion_gap",
"b_y_intensity_ratio",
"spectral_angle",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When users load a calibrator trained before this change (including the default Hugging Face model unless it is republished in lockstep), its pickled FragmentMatchFeatures instance now reports one extra column while the stored scaler/classifier were fitted without it. ProbabilityCalibrator.predict() then builds a wider feature matrix and StandardScaler.transform raises a feature-count error. Please raise a warning that older calibrators are no longer supported.

See test_fragment_match_columns_preserve_saved_calibrator_width

@JemmaLDaniel JemmaLDaniel Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adapted this test to be more generic to catch any feature matrix dimension shift on load and predict, and added a clear error message when this happens

Comment thread winnow/calibration/features/utils.py Outdated
source_annotations: Union[List[bytes], List[str]],
mz_tolerance: float = 0.02,
) -> Tuple[float, float, List[str], List[float], List[float]]:
) -> Tuple[float, float, List[str], List[float], List[float], List[float]]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For callers that import this helper directly (it is exported via winnow.calibration.features and re-exported through the backwards-compatible calibration_features module), adding a sixth tuple item breaks existing five-value unpacking even though the first five values still mean the same thing. Please raise a warning that older calibrators are no longer supported.

See test_find_matching_ions_preserves_public_return_arity (needs to be adapted to catch the warning).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now be addressed in #184

source_column: str,
source_mz_column: str,
source_annotation_column: str,
source_intensity_column: str,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct callers of this re-exported helper that still use the previous API (source_column=..., no intensity column, and seven returned iterables) now fail with TypeError or unpacking errors. Please raise a warning that old format is no longer supported.

See test_compute_ion_identifications_preserves_public_api_without_spectral_angle (adapt it to catch warning)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this with hard failures, instead of warnings, because the old format is not supported

Comment thread winnow/calibration/features/utils.py Outdated
theoretical, with 0.0 for unmatched ions).

Returns:
Spectral angle in radians. 0 indicates perfect correlation, 1 indicates perfect anti-correlation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation/naming semantics issue:

The implementation:

return 1 - (2 * np.arccos(dot_product) / np.pi)

computes a normalized spectral angle similarity score:

  • perfect match: 1.0
  • orthogonal vectors: 0.0
  • opposite vectors: -1.0 (though fragment intensities should normally be non-negative, so this is not expected in practice)

This function is not returning radians, and it is not returning a distance where 0 is perfect. So I would change "spectral angle" everywhere to "normalized spectral angle similarity score" ande this line to:

Suggested change
Spectral angle in radians. 0 indicates perfect correlation, 1 indicates perfect anti-correlation.
Normalized spectral angle similarity. 1 indicates perfect agreement and 0 indicates orthogonal non-negative intensity vectors.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, should be clarified now. I chose to keep the column name as spectral_angle to be consistent with my other column naming conventions which do not explicitly mention normalisation in the name but describe it in the documentation.

Base automatically changed from feat-b-y-intensity-ratio to main June 27, 2026 13:51
@JemmaLDaniel JemmaLDaniel force-pushed the feat-spectral-angle branch from 9b7bf59 to f5f80b4 Compare June 27, 2026 14:51
@JemmaLDaniel JemmaLDaniel requested a review from BioGeek June 27, 2026 14:56

@BioGeek BioGeek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now!

@JemmaLDaniel JemmaLDaniel merged commit 9c2fe3d into main Jun 27, 2026
7 checks passed
@JemmaLDaniel JemmaLDaniel deleted the feat-spectral-angle branch June 27, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add spectral angle or correlation to Prosit Feature

2 participants