Description of the bug
Hi @jonasscheid ,
We are very excited about the single sample (no replicates) quantification in the new 3.2.0 version. I was testing it with 200 samples from different instruments and we hit two reproducible issues, both silent the pipeline reports success.
Failure A — silent per-sample drop in QUANT subworkflow
Symptom
For multi-sample runs (each row in the samplesheet a distinct (Sample, Condition) group with group_count=1, quantify=true, fdr_level=psm_level_fdrs), all upstream stages complete normally for every sample:
OPENMSTHIRDPARTY_COMETADAPTER, OPENMS_PEPTIDEINDEXER, RESCORE:MS2RESCORE, RESCORE:OPENMS_PSMFEATUREEXTRACTOR, RESCORE:OPENMS_PERCOLATORADAPTER, RESCORE:OPENMS_IDFILTER_Q_VALUE, QUANT:OPENMS_IDRIPPER — N/N tasks submitted and completed.
Then a subset of samples (often 0–6 of 10) reach QUANT:OPENMS_IDFILTER_QUANT. The remaining samples disappear silently from the channel: no error, no warning, no log line indicating the drop. The pipeline exits 0 because every submitted task succeeded.
In our runs across multiple Parameter_Configurations (highres-HCD, highres-CID, timsTOF, EThcD), about 75% of samples per multi-sample run are lost this way. The same input samples processed under v3.1.0 produced complete outputs.
Process tally for one representative 10-sample run
| Process |
Submitted |
Completed |
OPENMSTHIRDPARTY_COMETADAPTER |
10 |
10 |
OPENMS_PEPTIDEINDEXER |
10 |
10 |
OPENMS_IDMERGER |
10 |
10 |
RESCORE:MS2RESCORE |
10 |
10 |
RESCORE:OPENMS_PSMFEATUREEXTRACTOR |
10 |
10 |
RESCORE:OPENMS_PERCOLATORADAPTER |
10 |
10 |
RESCORE:OPENMS_IDFILTER_Q_VALUE |
10 |
10 |
QUANT:OPENMS_IDRIPPER |
10 |
10 |
QUANT:OPENMS_IDFILTER_QUANT |
2 |
2 ← drop |
QUANT:MAP_ALIGNMENT:OPENMS_MAPALIGNERIDENTIFICATION |
2 |
2 |
QUANT:PROCESS_FEATURE:OPENMS_FEATUREFINDERIDENTIFICATION |
2 |
2 |
OPENMS_TEXTEXPORTER |
2 |
2 |
SUMMARIZE_RESULTS |
2 |
2 |
Observed correlation: timing of upstream completions
The per-sample outcome correlates with the relative arrival timing of OPENMS_IDFILTER_Q_VALUE and OPENMS_IDRIPPER for that sample. Across two independent runs, every sample with IDFILTER_Q_VALUE finishing ≥10 s before its IDRipper survived; every sample with arrivals within 100 ms or with IDRipper first was dropped.
Sample-level pattern from a 10-sample run:
| Δ = t(IDRipper) − t(IDFILTER_Q_VALUE) |
Outcome |
| +80 s, +30 s |
SURVIVED (2 samples) |
| ≤ 0.1 s or negative |
DROPPED (8 samples) |
Same pattern in another 10-sample run: all samples with Δ ≥ 10 s survived (6 samples), all samples with Δ ≤ 0.1 s dropped (4 samples).
Why it may not have been caught in CI
test profile uses 3 replicates merged into one (HepG2, A) group (group_count=3).
test_single_quant uses 1 sample.
test_full runs 2 samples in distinct groups on AWS Batch via Seqera Platform — but the test only checks that the workflow finishes successfully; there is no assertion on per-sample output presence.
- Local nf-test runs (Docker / Singularity) execute without the AWS-Batch task-dispatch latency that we observed correlated with the drop.
Failure B — OpenMS 3.5.0 FAIMS regression in FeatureOverlapFilter::mergeFAIMSFeatures
Symptom
Restricted to FAIMS Lumos input. Samples that escape Failure A still fail at SUMMARIZE_RESULTS:
Traceback (most recent call last):
File "/nextflow-bin/summarize_results.py", line 257, in <module>
main()
File "/nextflow-bin/summarize_results.py", line 250, in main
process_file(args.input[0], …)
File "/nextflow-bin/summarize_results.py", line 140, in process_file
raise ValueError(f"The following required columns are missing: {missing_columns}")
ValueError: The following required columns are missing: {'COMET:xcorr'}
Direct comparison of the same FAIMS sample under v3.1.0 vs v3.2.0
Sample.featureXML from a single FAIMS Lumos input run through both versions:
| Metric |
v3.1.0 (OpenMS 3.4.0) |
v3.2.0 (OpenMS 3.5.0) |
<feature> count |
16,206 |
15,921 |
<PeptideHit> count |
7,515 |
0 |
<PeptideIdentification> count |
7,376 |
0 |
COMET:xcorr UserParam count |
7,515 |
0 |
Pointer to the relevant code (OpenMS)
src/openms/source/PROCESSING/FEATURE/FeatureOverlapFilter.cpp in mergeFAIMSFeatures:
// Combine back: merged FAIMS features + untouched non-FAIMS features
feature_map.clear();
for (auto& f : faims_features)
{
feature_map.push_back(std::move(f));
}
FeatureMap::clear() wipes the ProteinIdentifications array. The featureXML writer needs that array to render <PeptideIdentification> blocks, so the writer drops every assigned and unassigned peptide ID. The merged features are written; their ID payload is not.
The trigger is faims:merge_features = "true" — the default in OpenMS 3.5.0 FeatureFinderIdentificationAlgorithm (around line 181 of the algorithm cpp). nf-core/mhcquant does not expose this parameter, so the merge runs whenever FAIMS data is detected. Every FAIMS sample with multiple compensation voltages loses peptide IDs at FFid merge under v3.2.0.
This is upstream to nf-core/mhcquant. Filing it here so it is on your radar.
Environment
- Pipeline:
nf-core/mhcquant 3.2.0
- Containers: stock from
quay.io/biocontainers and biocontainers (no overrides)
- Nextflow:
25.10.4
- Executor:
awsbatch, S3 work-dir, Fusion mount
Command used and terminal output
Relevant files
No response
System information
No response
Description of the bug
Hi @jonasscheid ,
We are very excited about the single sample (no replicates) quantification in the new 3.2.0 version. I was testing it with 200 samples from different instruments and we hit two reproducible issues, both silent the pipeline reports success.
Failure A — silent per-sample drop in
QUANTsubworkflowSymptom
For multi-sample runs (each row in the samplesheet a distinct
(Sample, Condition)group withgroup_count=1,quantify=true,fdr_level=psm_level_fdrs), all upstream stages complete normally for every sample:OPENMSTHIRDPARTY_COMETADAPTER,OPENMS_PEPTIDEINDEXER,RESCORE:MS2RESCORE,RESCORE:OPENMS_PSMFEATUREEXTRACTOR,RESCORE:OPENMS_PERCOLATORADAPTER,RESCORE:OPENMS_IDFILTER_Q_VALUE,QUANT:OPENMS_IDRIPPER— N/N tasks submitted and completed.Then a subset of samples (often 0–6 of 10) reach
QUANT:OPENMS_IDFILTER_QUANT. The remaining samples disappear silently from the channel: no error, no warning, no log line indicating the drop. The pipeline exits 0 because every submitted task succeeded.In our runs across multiple Parameter_Configurations (highres-HCD, highres-CID, timsTOF, EThcD), about 75% of samples per multi-sample run are lost this way. The same input samples processed under v3.1.0 produced complete outputs.
Process tally for one representative 10-sample run
OPENMSTHIRDPARTY_COMETADAPTEROPENMS_PEPTIDEINDEXEROPENMS_IDMERGERRESCORE:MS2RESCORERESCORE:OPENMS_PSMFEATUREEXTRACTORRESCORE:OPENMS_PERCOLATORADAPTERRESCORE:OPENMS_IDFILTER_Q_VALUEQUANT:OPENMS_IDRIPPERQUANT:OPENMS_IDFILTER_QUANTQUANT:MAP_ALIGNMENT:OPENMS_MAPALIGNERIDENTIFICATIONQUANT:PROCESS_FEATURE:OPENMS_FEATUREFINDERIDENTIFICATIONOPENMS_TEXTEXPORTERSUMMARIZE_RESULTSObserved correlation: timing of upstream completions
The per-sample outcome correlates with the relative arrival timing of
OPENMS_IDFILTER_Q_VALUEandOPENMS_IDRIPPERfor that sample. Across two independent runs, every sample withIDFILTER_Q_VALUEfinishing ≥10 s before itsIDRippersurvived; every sample with arrivals within 100 ms or withIDRipperfirst was dropped.Sample-level pattern from a 10-sample run:
Same pattern in another 10-sample run: all samples with Δ ≥ 10 s survived (6 samples), all samples with Δ ≤ 0.1 s dropped (4 samples).
Why it may not have been caught in CI
testprofile uses 3 replicates merged into one(HepG2, A)group (group_count=3).test_single_quantuses 1 sample.test_fullruns 2 samples in distinct groups on AWS Batch via Seqera Platform — but the test only checks that the workflow finishes successfully; there is no assertion on per-sample output presence.Failure B — OpenMS 3.5.0 FAIMS regression in
FeatureOverlapFilter::mergeFAIMSFeaturesSymptom
Restricted to FAIMS Lumos input. Samples that escape Failure A still fail at
SUMMARIZE_RESULTS:Direct comparison of the same FAIMS sample under v3.1.0 vs v3.2.0
Sample.featureXMLfrom a single FAIMS Lumos input run through both versions:<feature>count<PeptideHit>count<PeptideIdentification>countCOMET:xcorrUserParam countPointer to the relevant code (OpenMS)
src/openms/source/PROCESSING/FEATURE/FeatureOverlapFilter.cppinmergeFAIMSFeatures:FeatureMap::clear()wipes theProteinIdentificationsarray. The featureXML writer needs that array to render<PeptideIdentification>blocks, so the writer drops every assigned and unassigned peptide ID. The merged features are written; their ID payload is not.The trigger is
faims:merge_features = "true"— the default in OpenMS 3.5.0FeatureFinderIdentificationAlgorithm(around line 181 of the algorithm cpp). nf-core/mhcquant does not expose this parameter, so the merge runs whenever FAIMS data is detected. Every FAIMS sample with multiple compensation voltages loses peptide IDs at FFid merge under v3.2.0.This is upstream to nf-core/mhcquant. Filing it here so it is on your radar.
Environment
nf-core/mhcquant 3.2.0quay.io/biocontainersandbiocontainers(no overrides)25.10.4awsbatch, S3 work-dir, Fusion mountCommand used and terminal output
Relevant files
No response
System information
No response