bigbio/quantmsdiann: Output

Introduction

This document describes the output produced by the pipeline. Most plots are taken from the pmultiqc report, which summarises results at the end of the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes DIA data using the following steps:

(Optional) Raw files are downloaded from PRIDE Archive using pridepy
RAW data is converted to mzML using ThermoRawFileParser; SCIEX .wiff files are converted via WiffConverter; .d (Bruker) and .dia files are handled natively
DIA-NN is used for identification and quantification of peptides and proteins
DIA-NN report is converted to MSstats-compatible format
Generation of QC reports using pmultiqc

Output structure

Output will be saved to the folder defined by the parameter --outdir.

Default Output Structure

results/
├── pipeline_info/             # Nextflow pipeline information
├── pridepy/                   # (Optional) Downloaded raw files from PRIDE Archive
├── sdrf/                      # SDRF files and configs
├── quant_tables/              # Quantification tables and results
│   ├── diann_report.{tsv,parquet}  # Main DIA-NN report
│   ├── diann_report.pg_matrix.tsv  # Protein group matrix
│   ├── diann_report.pr_matrix.tsv  # Precursor matrix
│   ├── diann_report.gg_matrix.tsv  # Gene group matrix
│   └── out_msstats_in.csv     # MSstats-compatible output
└── pmultiqc/                  # pmultiqc reports
    ├── multiqc_plots/
    │   ├── png/
    │   ├── svg/
    │   └── pdf/
    └── multiqc_data/

Verbose Output Structure

For more detailed output with all intermediate files, use the verbose output configuration by providing -profile verbose_modules. This is useful for debugging or detailed analysis:

results/
├── pipeline_info/
├── sdrf/
├── spectra/
│   ├── thermorawfileparser/         # Converted raw files
│   └── mzml_statistics/             # mzML file statistics
├── database_generation/
│   ├── insilico_library_generation/ # In silico library
│   └── assemble_empirical_library/  # Empirical library
├── diann_preprocessing/
│   ├── preliminary_analysis/        # Preliminary analysis results
│   └── individual_analysis/         # Individual analysis results
├── quant_tables/
└── pmultiqc/

Key Output Files

DIA-NN quantification results:
- quant_tables/diann_report.{tsv,parquet} - Main DIA-NN report with peptide and protein quantification
- quant_tables/diann_report.pr_matrix.tsv - Precursor quantification matrix
- quant_tables/diann_report.pg_matrix.tsv - Protein group quantification matrix
- quant_tables/diann_report.gg_matrix.tsv - Gene group quantification matrix
- quant_tables/diann_report.unique_genes_matrix.tsv - Unique gene quantification matrix
- quant_tables/out_msstats_in.csv - MSstats-compatible quantification table

Parquet vs TSV Output

Starting with DIA-NN 2.0, the main report is produced in Apache Parquet format (diann_report.parquet) instead of the legacy TSV (diann_report.tsv). Parquet files are columnar, compressed, and significantly faster to load in downstream tools such as Python (pandas/pyarrow) or R (arrow).

DIA-NN Version	Main report format	Matrix format
1.8.1	`diann_report.tsv`	`.tsv`
2.1.0+	`diann_report.parquet`	`.tsv`

The pipeline detects the DIA-NN version and handles the output format automatically. Downstream steps (MSstats conversion, pmultiqc) accept both formats.

To read Parquet files:

# Python
import pandas as pd
df = pd.read_parquet("diann_report.parquet")

# R
library(arrow)
df <- read_parquet("diann_report.parquet")

MSstats-Compatible Output

The pipeline produces quant_tables/out_msstats_in.csv, an MSstats-compatible quantification table generated by quantms-utils. This file contains long-format precursor-level intensities with the columns required by the MSstats R package for downstream statistical analysis (e.g. differential expression, sample-size estimation).

Key columns include: ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity.

The condition and biological replicate assignments are derived from the SDRF factor columns.

Optional Output Files

These files are not published by default. Enable them with save_* parameters or ext.* config properties (see Usage: Optional outputs).

library_generation/*.tsv - TSV spectral library from in-silico library generation (--save_speclib_tsv)

QPX Export (Experimental, 2.1.0)

When --enable_qpx_export is set, the pipeline produces a QPX Parquet dataset and a MuData .h5mu file under results/qpx/. <prefix> defaults to diann, overridden by --project_accession.

<prefix>.feature.parquet — precursor-level features
<prefix>.pg.parquet — protein-group intensities per run
<prefix>.sample.parquet, <prefix>.run.parquet — SDRF-derived metadata
<prefix>.h5mu — MuData with precursors and proteins modalities

import mudata as mu
mdata = mu.read("results/qpx/PXD019909.h5mu")

Nextflow pipeline info

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline.

pipeline_info/:

execution_report.html - Resource usage report
execution_timeline.html - Timeline visualization
execution_trace.txt - Detailed execution trace
pipeline_dag.html - DAG visualization
software_versions.yml - Software versions used

pmultiqc

All QC results are generated by pmultiqc, a proteomics plugin for MultiQC. The interactive HTML report provides:

Identification and quantification metrics
Sample-level quality statistics
Pipeline software versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigbio/quantmsdiann: Output

Introduction

Pipeline overview

Output structure

Default Output Structure

Verbose Output Structure

Key Output Files

Parquet vs TSV Output

MSstats-Compatible Output

Optional Output Files

QPX Export (Experimental, 2.1.0)

Nextflow pipeline info

pmultiqc

FilesExpand file tree

output.md

Latest commit

History

output.md

File metadata and controls

bigbio/quantmsdiann: Output

Introduction

Pipeline overview

Output structure

Default Output Structure

Verbose Output Structure

Key Output Files

Parquet vs TSV Output

MSstats-Compatible Output

Optional Output Files

QPX Export (Experimental, 2.1.0)

Nextflow pipeline info

pmultiqc