Skip to content

GetAllXics: seed peak bypasses claimed-peak guard, throws duplicate-key ArgumentException under wide tolerance #1074

@trishorts

Description

@trishorts

Summary

IndexingEngine<T>.GetAllXics can throw ArgumentException: An item with the same key has already been added. Key: MassSpectrometry.IndexedMass when run with a wide peak-finding tolerance. The seed peak of each XIC is added without checking the global claimed-peak set, so under a wide tolerance a new XIC can pick up a peak already owned by an earlier XIC and then crash when it tries to claim it.

Root cause

GetAllXics keeps a Dictionary<IIndexedPeak, ExtractedIonChromatogram> matchedPeaks as a global "this peak already belongs to an XIC" set. Inside GetXicByScanIndex:

  • Follow-on peaks are guarded against that set — an already-claimed peak is treated as a missed scan (IndexingEngine.cs, ~line 178).
  • The seed/initial peak is not guarded — it is selected by GetBestPeakFromBins and added directly (IndexingEngine.cs, ~lines 134–137).

GetBestPeakFromBins returns the peak closest in mass to the seed across all in-range bins. Under a narrow ppm tolerance there is usually only one candidate per scan, so the seed is effectively unique. Under a wide tolerance, multiple distinct peaks fall within range at the seed scan, and the "best" one can be a peak an earlier (higher-intensity) XIC already claimed. That peak gets added to the new XIC's peak list, and then:

foreach (var matchedPeak in newXIC.Peaks)
    matchedPeaks.Add(matchedPeak, newXIC);   // ~line 215 — throws: key already present

(IIndexedPeak uses reference equality, so the duplicate is the same already-claimed instance.)

Reproduction

Call MassIndexingEngine.GetAllXics with a tolerance wide enough that ≥2 deconvoluted envelopes fall within tolerance at the same scan, e.g.:

var engine = new MassIndexingEngine();
engine.IndexPeaks(scanArray, deconParams);
engine.GetAllXics(new AbsoluteTolerance(1.5), maxMissedScanAllowed: 1,
                  maxRTRange: double.MaxValue, numPeakThreshold: 1);

Observed on real top-down data: deterministically throws on 1 of 20 Jurkat top-down mzML fractions with a 1.5 Da absolute window (charge 1–60 Classic deconvolution). It does not reproduce with the narrow ppm tolerances GetAllXics is normally called with, which is why it's gone unnoticed.

Impact

A single colliding seed scan aborts the entire GetAllXics call (and any batch built on it). It's a latent crash for any caller that wants whole-run tracing with a wide mass/m-z window.

Suggested fix

Guard the seed peak the same way follow-on peaks are guarded: when selecting initialPeak in GetXicByScanIndex, skip peaks already present in matchedPeaks (and re-seed from the next unclaimed peak), so a claimed peak is never added to a second XIC. That fixes it at the source — preferable to defensively de-duping the matchedPeaks.Add loop, which would still let one peak appear in two XICs' peak lists.


Found while evaluating GetAllXics reuse for consensus mass tracing in #1069. Pre-existing in the XIC engine, independent of that PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions