A reproducible, modular metagenomics amplicon analysis pipeline built with Nextflow and QIIME 2, designed for end-to-end processing of paired-end sequencing data—from raw reads through denoising, taxonomy assignment, diversity analysis, and export to downstream ecological analysis frameworks such as phyloseq. This workflow supports automated processing of multiple samples using a user-provided sample sheet and metadata file.
This pipeline stems from my desire to build a foundational understanding of metagenomics pipelines and tools in order to incorporate them into multi-omics workflows.
- Features
- Workflow Overview
- Requirements
- Installation
- Usage
- Configuration
- Output Figures
- Contributing
- Read preprocessing
- Adapter trimming with Cutadapt
- Optional read QC with FastQC
- QIIME 2–based microbiome analysis
- Import of paired-end reads
- Demultiplexing summaries
- Denoising and ASV inference (DADA2)
- Feature table and representative sequence merging across samples
- Taxonomic classification using a pretrained sklearn classifier
- Taxonomic filtering
- Rarefaction analysis
- Core diversity metrics (alpha & beta diversity)
- Multi-sample aggregation
- Merging feature tables, representative sequences, and taxonomy
- Combined denoising statistics
- Visualization & reporting
- Automated MultiQC report across preprocessing and denoising steps
- Downstream compatibility
- Export of feature tables, taxonomy, and phylogenetic trees to phyloseq
- Reproducible execution
- Conda-based environments
- Compatible with the BU HPC cluster and aws cloud execution
- Scalable & restartable
- Automatic logging
- Resume support for interrupted runs
- Input paired-end reads + sample metadata
- Adapter trimming (Cutadapt)
- QIIME 2 import and demultiplexing summaries
- DADA2 denoising and ASV inference
- Merge feature tables and representative sequences
- Taxonomic classification and filtering
- Rarefaction depth assessment
- Core diversity analysis
- Export to phyloseq-compatible formats
- Nextflow (installed via Conda or module system)
- Conda / Mamba
- QIIME 2 (installed via Conda environment)
- Access to an HPC cluster or aws cloud executor
Notes:
- If running on BU SCC, required modules should already be available.
- If running elsewhere, see the envs/ directory for exact software versions.
- A pretrained QIIME 2 sklearn classifier is required for taxonomy assignment.
Clone the repository:
git clone https://github.com/<your-username>/<your-repo-name>.git
cd <your-repo-name>
Create and activate a Nextflow/QIIME 2 Conda environment:
module load miniconda
conda activate <your_nextflow_env>
Sample Sheet format:
- See example samplesheet.csv Basic execution for cluster:
nextflow run main.nf -profile conda,cluster
or for cloud:
nextflow run main.nf -profile conda,aws
Edit nextflow.config to:
- Set paths to:
- samplesheet.csv
- QIIME2 classfier
- Metadata file
- Specify rarefaction and sampling depths:
- rarefaction_depth
- sampling_depth
- Tune execution parameters:
- queueSize
- CPU and memory allocations
- Optionally enable workflow resumption:
resume = true
- Email me at jgsherry@bu.edu for additional information or contributing information


