Skip to content

v3.2.0: silent per-sample drop in QUANT subworkflow + OpenMS 3.5.0 FAIMS regression #458

@alisa2alithea

Description

@alisa2alithea

Description of the bug

Hi @jonasscheid ,

We are very excited about the single sample (no replicates) quantification in the new 3.2.0 version. I was testing it with 200 samples from different instruments and we hit two reproducible issues, both silent the pipeline reports success.

Failure A — silent per-sample drop in QUANT subworkflow

Symptom

For multi-sample runs (each row in the samplesheet a distinct (Sample, Condition) group with group_count=1, quantify=true, fdr_level=psm_level_fdrs), all upstream stages complete normally for every sample:

OPENMSTHIRDPARTY_COMETADAPTER, OPENMS_PEPTIDEINDEXER, RESCORE:MS2RESCORE, RESCORE:OPENMS_PSMFEATUREEXTRACTOR, RESCORE:OPENMS_PERCOLATORADAPTER, RESCORE:OPENMS_IDFILTER_Q_VALUE, QUANT:OPENMS_IDRIPPER — N/N tasks submitted and completed.

Then a subset of samples (often 0–6 of 10) reach QUANT:OPENMS_IDFILTER_QUANT. The remaining samples disappear silently from the channel: no error, no warning, no log line indicating the drop. The pipeline exits 0 because every submitted task succeeded.

In our runs across multiple Parameter_Configurations (highres-HCD, highres-CID, timsTOF, EThcD), about 75% of samples per multi-sample run are lost this way. The same input samples processed under v3.1.0 produced complete outputs.

Process tally for one representative 10-sample run

Process Submitted Completed
OPENMSTHIRDPARTY_COMETADAPTER 10 10
OPENMS_PEPTIDEINDEXER 10 10
OPENMS_IDMERGER 10 10
RESCORE:MS2RESCORE 10 10
RESCORE:OPENMS_PSMFEATUREEXTRACTOR 10 10
RESCORE:OPENMS_PERCOLATORADAPTER 10 10
RESCORE:OPENMS_IDFILTER_Q_VALUE 10 10
QUANT:OPENMS_IDRIPPER 10 10
QUANT:OPENMS_IDFILTER_QUANT 2 2 ← drop
QUANT:MAP_ALIGNMENT:OPENMS_MAPALIGNERIDENTIFICATION 2 2
QUANT:PROCESS_FEATURE:OPENMS_FEATUREFINDERIDENTIFICATION 2 2
OPENMS_TEXTEXPORTER 2 2
SUMMARIZE_RESULTS 2 2

Observed correlation: timing of upstream completions

The per-sample outcome correlates with the relative arrival timing of OPENMS_IDFILTER_Q_VALUE and OPENMS_IDRIPPER for that sample. Across two independent runs, every sample with IDFILTER_Q_VALUE finishing ≥10 s before its IDRipper survived; every sample with arrivals within 100 ms or with IDRipper first was dropped.

Sample-level pattern from a 10-sample run:

Δ = t(IDRipper) − t(IDFILTER_Q_VALUE) Outcome
+80 s, +30 s SURVIVED (2 samples)
≤ 0.1 s or negative DROPPED (8 samples)

Same pattern in another 10-sample run: all samples with Δ ≥ 10 s survived (6 samples), all samples with Δ ≤ 0.1 s dropped (4 samples).

Why it may not have been caught in CI

  • test profile uses 3 replicates merged into one (HepG2, A) group (group_count=3).
  • test_single_quant uses 1 sample.
  • test_full runs 2 samples in distinct groups on AWS Batch via Seqera Platform — but the test only checks that the workflow finishes successfully; there is no assertion on per-sample output presence.
  • Local nf-test runs (Docker / Singularity) execute without the AWS-Batch task-dispatch latency that we observed correlated with the drop.

Failure B — OpenMS 3.5.0 FAIMS regression in FeatureOverlapFilter::mergeFAIMSFeatures

Symptom

Restricted to FAIMS Lumos input. Samples that escape Failure A still fail at SUMMARIZE_RESULTS:

Traceback (most recent call last):
  File "/nextflow-bin/summarize_results.py", line 257, in <module>
    main()
  File "/nextflow-bin/summarize_results.py", line 250, in main
    process_file(args.input[0], …)
  File "/nextflow-bin/summarize_results.py", line 140, in process_file
    raise ValueError(f"The following required columns are missing: {missing_columns}")
ValueError: The following required columns are missing: {'COMET:xcorr'}

Direct comparison of the same FAIMS sample under v3.1.0 vs v3.2.0

Sample.featureXML from a single FAIMS Lumos input run through both versions:

Metric v3.1.0 (OpenMS 3.4.0) v3.2.0 (OpenMS 3.5.0)
<feature> count 16,206 15,921
<PeptideHit> count 7,515 0
<PeptideIdentification> count 7,376 0
COMET:xcorr UserParam count 7,515 0

Pointer to the relevant code (OpenMS)

src/openms/source/PROCESSING/FEATURE/FeatureOverlapFilter.cpp in mergeFAIMSFeatures:

// Combine back: merged FAIMS features + untouched non-FAIMS features
feature_map.clear();
for (auto& f : faims_features)
{
  feature_map.push_back(std::move(f));
}

FeatureMap::clear() wipes the ProteinIdentifications array. The featureXML writer needs that array to render <PeptideIdentification> blocks, so the writer drops every assigned and unassigned peptide ID. The merged features are written; their ID payload is not.

The trigger is faims:merge_features = "true" — the default in OpenMS 3.5.0 FeatureFinderIdentificationAlgorithm (around line 181 of the algorithm cpp). nf-core/mhcquant does not expose this parameter, so the merge runs whenever FAIMS data is detected. Every FAIMS sample with multiple compensation voltages loses peptide IDs at FFid merge under v3.2.0.

This is upstream to nf-core/mhcquant. Filing it here so it is on your radar.

Environment

  • Pipeline: nf-core/mhcquant 3.2.0
  • Containers: stock from quay.io/biocontainers and biocontainers (no overrides)
  • Nextflow: 25.10.4
  • Executor: awsbatch, S3 work-dir, Fusion mount

Command used and terminal output

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions