Skip to content

New pipeline: nf-core/midas #142

@MartinaCardinali

Description

@MartinaCardinali

Pipeline title/name

microbiome_differential_abudance_analyses

Keywords

microbiome, differential abundance, consensus, simulation, benchmarking

What is it about?

A microbiome differential abundance (DA) pipeline suited for 16S/metagenomic phyloseq data. It combines results from multiple DA tools into a single consensus call by two modes: (Path A) a _k-_intersection consensus that reports taxa called differentially abundant by at least k out of n tools, and (Path B) a simulation-trained soft consensus that uses MIDASim to generate ground-truth datasets, scores each tool's performance and applies an optimally weighted threshold to the results from the real data.

Please provide a schematic diagram of the proposed pipeline

Image

What would a minimal first release of this pipeline include?

The pipeline is built entirely in R. General-purpose libraries for data handling and visualization: phyloseq, dplyr, tidyr, tibble, readr, stringr, purrr, ggplot2, ggrepel, openxlsx, optparse, jsonlite, fs. Simulation: MIDASim. Differential abundance tools: ADAPT, corncob, LinDA, LOCOM, and MaAsLin2 (metagenomeSeq is also included but disabled by default). All dependencies are bundled in a versioned Docker image (rocker/r-ver:4.5.1 base), which is also usable via Singularity.

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

It is well established that individual DA tools produce substantially different results on the same dataset making tool choice a major source of variability in microbiome studies. This pipeline addresses this by running five complementary DA tools (ADAPT, corncob, LinDA, LOCOM, MaAsLin2) in parallel on the same phyloseq input and combining their outputs into a single consensus call. Its simulation-trained soft consensus mode — where MIDASim generates ground-truth datasets from the user's own control samples, each tool is scored against that truth, and an optimally weighted threshold is applied to the real data — is, to our knowledge, not available in any existing pipeline or R package within the nf-core ecosystem.

Who would be interested?

Microbiome researchers performing differential abundance analysis on 16S or metagenomic data. This includes:

Computational biologists and bioinformaticians working on case-control microbiome studies who need a reproducible, tool-agnostic consensus call rather than relying on a single DA method.
Wet-lab microbiome researchers who want a turnkey pipeline that handles the full DA workflow — from phyloseq input to a final list of consensus DA taxa — without requiring expertise in each individual statistical tool.
Benchmarking and methods developers interested in evaluating DA tool performance on simulated data derived from real microbial communities (via MIDASim), or in comparing consensus strategies (k-intersection vs. simulation-trained soft consensus).

What has been done so far

The pipeline is fully implemented and functional. Both execution paths have been validated end-to-end on public datasets from MicrobiomeBenchmarkData (bacterial vaginosis, sub- and supragingival plaque). The codebase includes: a main workflow (workflows/midas.nf), 6 DA tool modules (each with simulated and real variants), subworkflows for simulation, scoring, and consensus, 12 R scripts in bin/, a versioned Dockerfile, and Docker/Singularity profiles.

nf-core scaffolding is in place: manifest block, .nf-core.yml, nextflow_schema.json, MIT license, CITATIONS.md, README.md, usage and output documentation, and versions.yml emission in all 17 modules.

What remains after proposal acceptance: running nf-core create to adopt the full template skeleton, wiring up nf-validation, setting up nf-test with a minimal test profile and public test data, CI via GitHub Actions, and publishing the container to a public registry.

The URL to the existing work is not given since the repository is not public yet; it will be transferred to the nf-core GitHub organisation upon acceptance.

URL to existing work (if applicable)

No response

Are there any similar existing nf-core pipelines?

there is the differentialabundance pipeline which has been developed for transcriptomics data, so not relevant for microbiome data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions