Summary
IndexingEngine<T>.GetAllXics can throw ArgumentException: An item with the same key has already been added. Key: MassSpectrometry.IndexedMass when run with a wide peak-finding tolerance. The seed peak of each XIC is added without checking the global claimed-peak set, so under a wide tolerance a new XIC can pick up a peak already owned by an earlier XIC and then crash when it tries to claim it.
Root cause
GetAllXics keeps a Dictionary<IIndexedPeak, ExtractedIonChromatogram> matchedPeaks as a global "this peak already belongs to an XIC" set. Inside GetXicByScanIndex:
- Follow-on peaks are guarded against that set — an already-claimed peak is treated as a missed scan (
IndexingEngine.cs, ~line 178).
- The seed/initial peak is not guarded — it is selected by
GetBestPeakFromBins and added directly (IndexingEngine.cs, ~lines 134–137).
GetBestPeakFromBins returns the peak closest in mass to the seed across all in-range bins. Under a narrow ppm tolerance there is usually only one candidate per scan, so the seed is effectively unique. Under a wide tolerance, multiple distinct peaks fall within range at the seed scan, and the "best" one can be a peak an earlier (higher-intensity) XIC already claimed. That peak gets added to the new XIC's peak list, and then:
foreach (var matchedPeak in newXIC.Peaks)
matchedPeaks.Add(matchedPeak, newXIC); // ~line 215 — throws: key already present
(IIndexedPeak uses reference equality, so the duplicate is the same already-claimed instance.)
Reproduction
Call MassIndexingEngine.GetAllXics with a tolerance wide enough that ≥2 deconvoluted envelopes fall within tolerance at the same scan, e.g.:
var engine = new MassIndexingEngine();
engine.IndexPeaks(scanArray, deconParams);
engine.GetAllXics(new AbsoluteTolerance(1.5), maxMissedScanAllowed: 1,
maxRTRange: double.MaxValue, numPeakThreshold: 1);
Observed on real top-down data: deterministically throws on 1 of 20 Jurkat top-down mzML fractions with a 1.5 Da absolute window (charge 1–60 Classic deconvolution). It does not reproduce with the narrow ppm tolerances GetAllXics is normally called with, which is why it's gone unnoticed.
Impact
A single colliding seed scan aborts the entire GetAllXics call (and any batch built on it). It's a latent crash for any caller that wants whole-run tracing with a wide mass/m-z window.
Suggested fix
Guard the seed peak the same way follow-on peaks are guarded: when selecting initialPeak in GetXicByScanIndex, skip peaks already present in matchedPeaks (and re-seed from the next unclaimed peak), so a claimed peak is never added to a second XIC. That fixes it at the source — preferable to defensively de-duping the matchedPeaks.Add loop, which would still let one peak appear in two XICs' peak lists.
Found while evaluating GetAllXics reuse for consensus mass tracing in #1069. Pre-existing in the XIC engine, independent of that PR.
Summary
IndexingEngine<T>.GetAllXicscan throwArgumentException: An item with the same key has already been added. Key: MassSpectrometry.IndexedMasswhen run with a wide peak-finding tolerance. The seed peak of each XIC is added without checking the global claimed-peak set, so under a wide tolerance a new XIC can pick up a peak already owned by an earlier XIC and then crash when it tries to claim it.Root cause
GetAllXicskeeps aDictionary<IIndexedPeak, ExtractedIonChromatogram> matchedPeaksas a global "this peak already belongs to an XIC" set. InsideGetXicByScanIndex:IndexingEngine.cs, ~line 178).GetBestPeakFromBinsand added directly (IndexingEngine.cs, ~lines 134–137).GetBestPeakFromBinsreturns the peak closest in mass to the seed across all in-range bins. Under a narrow ppm tolerance there is usually only one candidate per scan, so the seed is effectively unique. Under a wide tolerance, multiple distinct peaks fall within range at the seed scan, and the "best" one can be a peak an earlier (higher-intensity) XIC already claimed. That peak gets added to the new XIC's peak list, and then:(
IIndexedPeakuses reference equality, so the duplicate is the same already-claimed instance.)Reproduction
Call
MassIndexingEngine.GetAllXicswith a tolerance wide enough that ≥2 deconvoluted envelopes fall within tolerance at the same scan, e.g.:Observed on real top-down data: deterministically throws on 1 of 20 Jurkat top-down mzML fractions with a 1.5 Da absolute window (charge 1–60 Classic deconvolution). It does not reproduce with the narrow ppm tolerances
GetAllXicsis normally called with, which is why it's gone unnoticed.Impact
A single colliding seed scan aborts the entire
GetAllXicscall (and any batch built on it). It's a latent crash for any caller that wants whole-run tracing with a wide mass/m-z window.Suggested fix
Guard the seed peak the same way follow-on peaks are guarded: when selecting
initialPeakinGetXicByScanIndex, skip peaks already present inmatchedPeaks(and re-seed from the next unclaimed peak), so a claimed peak is never added to a second XIC. That fixes it at the source — preferable to defensively de-duping thematchedPeaks.Addloop, which would still let one peak appear in two XICs' peak lists.Found while evaluating
GetAllXicsreuse for consensus mass tracing in #1069. Pre-existing in the XIC engine, independent of that PR.