This workflow processes raw mass spectrometry data through peak detection, alignment, normalization, statistical analysis, and pathway interpretation.
BiocManager::install(c('xcms', 'CAMERA', 'MetaboAnalystR'))
install.packages(c('metablastr', 'pheatmap'))Tell your AI agent what you want to do:
- "Run the metabolomics pipeline on my mzML files"
- "Process my LC-MS data and find differential metabolites"
- "Analyze my lipidomics experiment"
"I have mzML files from an untargeted metabolomics study, run the full pipeline"
"Process my LC-MS/MS data with XCMS and run differential analysis"
"Apply QC-based batch correction to my metabolomics data"
"Normalize my metabolomics data and check sample quality with PCA"
"Find enriched metabolic pathways in my differential metabolites"
"Annotate my significant features against HMDB and run pathway enrichment"
- Untargeted metabolomics studies
- LC-MS/MS metabolite profiling
- Lipidomics analysis
- Metabolic biomarker discovery
- Treatment response studies
- Raw MS data - mzML or mzXML format (converted from vendor formats)
- Sample metadata - CSV with sample names, conditions, batches
- QC samples - Pooled QC samples recommended
sample,condition,batch,injection_order
Sample1.mzML,Control,1,1
Sample2.mzML,Control,1,2
QC1.mzML,QC,1,3
Sample3.mzML,Treatment,1,4- Identifies chromatographic peaks in each sample
- CentWave algorithm for LC-MS data
- Adjust peakwidth based on chromatography
- Corrects RT drift between samples
- Obiwarp or peak groups methods
- Essential for feature matching
- Groups peaks across samples into features
- Based on m/z and aligned RT
- minFraction controls stringency
- Recovers missing values
- Integrates signal at expected locations
- Reduces false missing values
- Corrects systematic variation
- Options: median, quantile, LOESS
- QC-based correction for batches
- limma for differential analysis
- Handles missing values
- Multiple testing correction
- Match m/z to databases (HMDB, KEGG, LipidMaps)
- Consider adducts and isotopes
- MS/MS matching for confidence
- Map to KEGG pathways
- Over-representation analysis
- Metabolite set enrichment
| Parameter | UPLC | Standard LC | GC-MS |
|---|---|---|---|
| peakwidth | 5-30 | 10-60 | 2-10 |
| ppm | 15-25 | 25-50 | 10-20 |
| snthresh | 10 | 10 | 5 |
| Parameter | Typical | Stringent |
|---|---|---|
| bw | 5-10 | 2-3 |
| minFraction | 0.5 | 0.8 |
| binSize | 0.025 | 0.01 |
- Pool equal volumes from all samples
- Inject QC every 5-10 samples
- Use for batch correction and quality assessment
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Features detected | >5000 | 2000-5000 | <2000 |
| QC CV | <20% | 20-30% | >30% |
| Blank ratio | >10x | 5-10x | <5x |
- Adjust peak detection parameters
- Check raw data quality
- Lower snthresh carefully
- Check for RT drift pattern
- Use more reference peaks
- Consider subset alignment
- Reduce minFraction
- Improve gap filling
- Check sample quality
- Check experimental design
- Consider effect sizes
- Adjust FDR threshold
| File | Description |
|---|---|
| normalized_feature_matrix.csv | Processed feature intensities |
| differential_metabolites.csv | Statistical results |
| qc_pca.png | PCA quality check |
| volcano_metabolites.png | Differential analysis plot |
| pathway_overview.png | Enriched pathways |
- QC samples: Inject pooled QC every 5-10 samples for batch correction
- Peak detection: Adjust peakwidth based on your chromatography (UPLC: 5-30, standard LC: 10-60)
- Missing values: High missing values may indicate poor sample quality
- Annotation confidence: MS/MS matching provides higher confidence than m/z alone
- mzML conversion: Convert vendor files using ProteoWizard msConvert
- XCMS: doi:10.1021/ac051437y
- MetaboAnalystR: doi:10.1093/bioinformatics/btaa123
- xcms3 workflow: doi:10.3390/metabo10120504