New pipeline: nf-core/midas

### Pipeline title/name

microbiome_differential_abudance_analyses

### Keywords

microbiome, differential abundance, consensus, simulation, benchmarking

### What is it about?

A microbiome differential abundance (DA) pipeline suited for 16S/metagenomic phyloseq data. It combines results from multiple DA tools into a single consensus call by two modes: (Path A) a _k-_intersection consensus that reports taxa called differentially abundant by at least _k_ out of _n_ tools, and (Path B) a simulation-trained soft consensus that uses MIDASim to generate ground-truth datasets, scores each tool's performance and applies an optimally weighted threshold to the results from the real data. 

### Please provide a schematic diagram of the proposed pipeline

<img width="1649" height="601" alt="Image" src="https://github.com/user-attachments/assets/f50134c3-7799-4ed5-b1f2-aad68a1ef4bd" />

### What would a minimal first release of this pipeline include?

The pipeline is built entirely in R. General-purpose libraries for data handling and visualization: phyloseq, dplyr, tidyr, tibble, readr, stringr, purrr, ggplot2, ggrepel, openxlsx, optparse, jsonlite, fs. Simulation: MIDASim. Differential abundance tools: ADAPT, corncob, LinDA, LOCOM, and MaAsLin2 (metagenomeSeq is also included but disabled by default). All dependencies are bundled in a versioned Docker image (rocker/r-ver:4.5.1 base), which is also usable via Singularity.

### I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

- [x] be built with Nextflow.
- [x] pass nf-core lint tests and use standardized parameters.
- [x] be community-owned and developed within the nf-core organization.
- [x] open source under the MIT license with proper credits and acknowledgments.
- [x] have a descriptive, all lowercase, and without punctuation name.
- [x] use the nf-core pipeline template and predominantly use official nf-core modules.
- [x] focus on a specific data/analysis type with appropriate scope.
- [x] have properly maintained documentation.
- [x] be bundled using versioned Docker/Singularity containers.

### Why do we need a new pipeline?

It is well established that individual DA tools produce substantially different results on the same dataset making tool choice a major source of variability in microbiome studies. This pipeline addresses this by running five complementary DA tools (ADAPT, corncob, LinDA, LOCOM, MaAsLin2) in parallel on the same phyloseq input and combining their outputs into a single consensus call. Its simulation-trained soft consensus mode — where MIDASim generates ground-truth datasets from the user's own control samples, each tool is scored against that truth, and an optimally weighted threshold is applied to the real data — is, to our knowledge, not available in any existing pipeline or R package within the nf-core ecosystem.

### Who would be interested?

Microbiome researchers performing differential abundance analysis on 16S or metagenomic data. This includes:

Computational biologists and bioinformaticians working on case-control microbiome studies who need a reproducible, tool-agnostic consensus call rather than relying on a single DA method.
Wet-lab microbiome researchers who want a turnkey pipeline that handles the full DA workflow — from phyloseq input to a final list of consensus DA taxa — without requiring expertise in each individual statistical tool.
Benchmarking and methods developers interested in evaluating DA tool performance on simulated data derived from real microbial communities (via MIDASim), or in comparing consensus strategies (k-intersection vs. simulation-trained soft consensus).

### What has been done so far

The pipeline is fully implemented and functional. Both execution paths have been validated end-to-end on public datasets from MicrobiomeBenchmarkData (bacterial vaginosis, sub- and supragingival plaque). The codebase includes: a main workflow (workflows/midas.nf), 6 DA tool modules (each with simulated and real variants), subworkflows for simulation, scoring, and consensus, 12 R scripts in bin/, a versioned Dockerfile, and Docker/Singularity profiles.

 nf-core scaffolding is in place: manifest block, .nf-core.yml, nextflow_schema.json, MIT license, CITATIONS.md, README.md, usage and output documentation, and versions.yml emission in all 17 modules.

What remains after proposal acceptance: running nf-core create to adopt the full template skeleton, wiring up nf-validation, setting up nf-test with a minimal test profile and public test data, CI via GitHub Actions, and publishing the container to a public registry.

The URL to the existing work is not given since the repository is not public yet; it will be transferred to the nf-core GitHub organisation upon acceptance.

### URL to existing work (if applicable)

_No response_

### Are there any similar existing nf-core pipelines?

there is the differentialabundance pipeline which has been developed for transcriptomics data, so not relevant for microbiome data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New pipeline: nf-core/midas #142

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

What would a minimal first release of this pipeline include?

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

New pipeline: nf-core/midas #142

Description

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

What would a minimal first release of this pipeline include?

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions