Skip to content

Latest commit

 

History

History
114 lines (91 loc) · 3.61 KB

File metadata and controls

114 lines (91 loc) · 3.61 KB

MetaDiversity Nextflow Pipeline

A reproducible, modular metagenomics amplicon analysis pipeline built with Nextflow and QIIME 2, designed for end-to-end processing of paired-end sequencing data—from raw reads through denoising, taxonomy assignment, diversity analysis, and export to downstream ecological analysis frameworks such as phyloseq. This workflow supports automated processing of multiple samples using a user-provided sample sheet and metadata file.

This pipeline stems from my desire to build a foundational understanding of metagenomics pipelines and tools in order to incorporate them into multi-omics workflows.

Table of Contents

  1. Features
  2. Workflow Overview
  3. Requirements
  4. Installation
  5. Usage
  6. Configuration
  7. Output Figures
  8. Contributing

Features

  • Read preprocessing
    • Adapter trimming with Cutadapt
    • Optional read QC with FastQC
  • QIIME 2–based microbiome analysis
    • Import of paired-end reads
    • Demultiplexing summaries
    • Denoising and ASV inference (DADA2)
    • Feature table and representative sequence merging across samples
    • Taxonomic classification using a pretrained sklearn classifier
    • Taxonomic filtering
    • Rarefaction analysis
    • Core diversity metrics (alpha & beta diversity)
  • Multi-sample aggregation
    • Merging feature tables, representative sequences, and taxonomy
    • Combined denoising statistics
  • Visualization & reporting
    • Automated MultiQC report across preprocessing and denoising steps
  • Downstream compatibility
    • Export of feature tables, taxonomy, and phylogenetic trees to phyloseq
  • Reproducible execution
    • Conda-based environments
    • Compatible with the BU HPC cluster and aws cloud execution
  • Scalable & restartable
    • Automatic logging
    • Resume support for interrupted runs

Workflow Overview

  1. Input paired-end reads + sample metadata
  2. Adapter trimming (Cutadapt)
  3. QIIME 2 import and demultiplexing summaries
  4. DADA2 denoising and ASV inference
  5. Merge feature tables and representative sequences
  6. Taxonomic classification and filtering
  7. Rarefaction depth assessment
  8. Core diversity analysis
  9. Export to phyloseq-compatible formats

Mermaid-plot

Requirements

  • Nextflow (installed via Conda or module system)
  • Conda / Mamba
  • QIIME 2 (installed via Conda environment)
  • Access to an HPC cluster or aws cloud executor

Notes:

  • If running on BU SCC, required modules should already be available.
  • If running elsewhere, see the envs/ directory for exact software versions.
  • A pretrained QIIME 2 sklearn classifier is required for taxonomy assignment.

Installation

Clone the repository:

git clone https://github.com/<your-username>/<your-repo-name>.git

cd <your-repo-name>

Create and activate a Nextflow/QIIME 2 Conda environment:

module load miniconda

conda activate <your_nextflow_env>

Usage

Sample Sheet format:

  • See example samplesheet.csv Basic execution for cluster:

nextflow run main.nf -profile conda,cluster

or for cloud:

nextflow run main.nf -profile conda,aws

Configuration

Edit nextflow.config to:

  • Set paths to:
    • samplesheet.csv
    • QIIME2 classfier
    • Metadata file
  • Specify rarefaction and sampling depths:
    • rarefaction_depth
    • sampling_depth
  • Tune execution parameters:
    • queueSize
    • CPU and memory allocations
  • Optionally enable workflow resumption:
    • resume = true

Example Output Figures

Sample heatmap

Sample heatmap

Contributing

  • Email me at jgsherry@bu.edu for additional information or contributing information