Skip to content

JackSherry6/MetaDiversity-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaDiversity Nextflow Pipeline

A reproducible, modular metagenomics amplicon analysis pipeline built with Nextflow and QIIME 2, designed for end-to-end processing of paired-end sequencing data—from raw reads through denoising, taxonomy assignment, diversity analysis, and export to downstream ecological analysis frameworks such as phyloseq. This workflow supports automated processing of multiple samples using a user-provided sample sheet and metadata file.

This pipeline stems from my desire to build a foundational understanding of metagenomics pipelines and tools in order to incorporate them into multi-omics workflows.

Table of Contents

  1. Features
  2. Workflow Overview
  3. Requirements
  4. Installation
  5. Usage
  6. Configuration
  7. Output Figures
  8. Contributing

Features

  • Read preprocessing
    • Adapter trimming with Cutadapt
    • Optional read QC with FastQC
  • QIIME 2–based microbiome analysis
    • Import of paired-end reads
    • Demultiplexing summaries
    • Denoising and ASV inference (DADA2)
    • Feature table and representative sequence merging across samples
    • Taxonomic classification using a pretrained sklearn classifier
    • Taxonomic filtering
    • Rarefaction analysis
    • Core diversity metrics (alpha & beta diversity)
  • Multi-sample aggregation
    • Merging feature tables, representative sequences, and taxonomy
    • Combined denoising statistics
  • Visualization & reporting
    • Automated MultiQC report across preprocessing and denoising steps
  • Downstream compatibility
    • Export of feature tables, taxonomy, and phylogenetic trees to phyloseq
  • Reproducible execution
    • Conda-based environments
    • Compatible with the BU HPC cluster and aws cloud execution
  • Scalable & restartable
    • Automatic logging
    • Resume support for interrupted runs

Workflow Overview

  1. Input paired-end reads + sample metadata
  2. Adapter trimming (Cutadapt)
  3. QIIME 2 import and demultiplexing summaries
  4. DADA2 denoising and ASV inference
  5. Merge feature tables and representative sequences
  6. Taxonomic classification and filtering
  7. Rarefaction depth assessment
  8. Core diversity analysis
  9. Export to phyloseq-compatible formats

Mermaid-plot

Requirements

  • Nextflow (installed via Conda or module system)
  • Conda / Mamba
  • QIIME 2 (installed via Conda environment)
  • Access to an HPC cluster or aws cloud executor

Notes:

  • If running on BU SCC, required modules should already be available.
  • If running elsewhere, see the envs/ directory for exact software versions.
  • A pretrained QIIME 2 sklearn classifier is required for taxonomy assignment.

Installation

Clone the repository:

git clone https://github.com/<your-username>/<your-repo-name>.git

cd <your-repo-name>

Create and activate a Nextflow/QIIME 2 Conda environment:

module load miniconda

conda activate <your_nextflow_env>

Usage

Sample Sheet format:

  • See example samplesheet.csv Basic execution for cluster:

nextflow run main.nf -profile conda,cluster

or for cloud:

nextflow run main.nf -profile conda,aws

Configuration

Edit nextflow.config to:

  • Set paths to:
    • samplesheet.csv
    • QIIME2 classfier
    • Metadata file
  • Specify rarefaction and sampling depths:
    • rarefaction_depth
    • sampling_depth
  • Tune execution parameters:
    • queueSize
    • CPU and memory allocations
  • Optionally enable workflow resumption:
    • resume = true

Example Output Figures

Sample heatmap

Sample heatmap

Contributing

  • Email me at jgsherry@bu.edu for additional information or contributing information

About

Metagenomics Nextflow Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors