MetaDiversity Nextflow Pipeline

A reproducible, modular metagenomics amplicon analysis pipeline built with Nextflow and QIIME 2, designed for end-to-end processing of paired-end sequencing data—from raw reads through denoising, taxonomy assignment, diversity analysis, and export to downstream ecological analysis frameworks such as phyloseq. This workflow supports automated processing of multiple samples using a user-provided sample sheet and metadata file.

This pipeline stems from my desire to build a foundational understanding of metagenomics pipelines and tools in order to incorporate them into multi-omics workflows.

Features
Workflow Overview
Requirements
Installation
Usage
Configuration
Output Figures
Contributing

Features

Read preprocessing
- Adapter trimming with Cutadapt
- Optional read QC with FastQC
QIIME 2–based microbiome analysis
- Import of paired-end reads
- Demultiplexing summaries
- Denoising and ASV inference (DADA2)
- Feature table and representative sequence merging across samples
- Taxonomic classification using a pretrained sklearn classifier
- Taxonomic filtering
- Rarefaction analysis
- Core diversity metrics (alpha & beta diversity)
Multi-sample aggregation
- Merging feature tables, representative sequences, and taxonomy
- Combined denoising statistics
Visualization & reporting
- Automated MultiQC report across preprocessing and denoising steps
Downstream compatibility
- Export of feature tables, taxonomy, and phylogenetic trees to phyloseq
Reproducible execution
- Conda-based environments
- Compatible with the BU HPC cluster and aws cloud execution
Scalable & restartable
- Automatic logging
- Resume support for interrupted runs

Workflow Overview

Input paired-end reads + sample metadata
Adapter trimming (Cutadapt)
QIIME 2 import and demultiplexing summaries
DADA2 denoising and ASV inference
Merge feature tables and representative sequences
Taxonomic classification and filtering
Rarefaction depth assessment
Core diversity analysis
Export to phyloseq-compatible formats

Requirements

Nextflow (installed via Conda or module system)
Conda / Mamba
QIIME 2 (installed via Conda environment)
Access to an HPC cluster or aws cloud executor

Notes:

If running on BU SCC, required modules should already be available.
If running elsewhere, see the envs/ directory for exact software versions.
A pretrained QIIME 2 sklearn classifier is required for taxonomy assignment.

Installation

Clone the repository:

git clone https://github.com/<your-username>/<your-repo-name>.git

cd <your-repo-name>

Create and activate a Nextflow/QIIME 2 Conda environment:

module load miniconda

conda activate <your_nextflow_env>

Usage

Sample Sheet format:

See example samplesheet.csv Basic execution for cluster:

nextflow run main.nf -profile conda,cluster

or for cloud:

nextflow run main.nf -profile conda,aws

Configuration

Edit nextflow.config to:

Set paths to:
- samplesheet.csv
- QIIME2 classfier
- Metadata file
Specify rarefaction and sampling depths:
- rarefaction_depth
- sampling_depth
Tune execution parameters:
- queueSize
- CPU and memory allocations
Optionally enable workflow resumption:
- resume = true

Example Output Figures

Contributing

Email me at jgsherry@bu.edu for additional information or contributing information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaDiversity Nextflow Pipeline

Table of Contents

Features

Workflow Overview

Requirements

Installation

Usage

Configuration

Example Output Figures

Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MetaDiversity Nextflow Pipeline

Table of Contents

Features

Workflow Overview

Requirements

Installation

Usage

Configuration

Example Output Figures

Contributing