SARS-CoV-2 Surveillance Pipeline

A comprehensive Nextflow DSL2 pipeline for SARS-CoV-2 genomic surveillance using Oxford Nanopore Technologies (ONT) sequencing data.

This workflow was developed when the EPI2ME-lab ARTIC workflow was deprecated. It currently functions for internal use, but it is not yet ready for external release without modification on the users system. Several issues should be noted by potential users:

Known Installation Issues

Nextflow Conda environment build failures

proovframe may fail to install automatically. The Conda environment created by Nextflow must be manually activated, and proovframe installed manually.

The ARTIC component requires its models to be downloaded manually. This can be done with the command XXXXX, but only after activating the relevant Conda environment.

Limitations

Currently, only a single primer scheme is supported.

Overview

This pipeline processes ONT sequencing data to generate high-quality SARS-CoV-2 consensus genomes and performs downstream analyses including variant calling, lineage assignment, and coverage analysis. It's specifically designed for surveillance activities at Can Ruti Hospital.

Features

Quality Control: Guppyplex filtering and quality assessment
Consensus Calling: ARTIC workflow for robust consensus genome generation
Frameshift Correction: ProovFrame integration to correct sequencing-induced frameshifts
Variant Analysis: Nextclade for clade assignment and mutation detection
Lineage Assignment: Pangolin for PANGO lineage classification
Coverage Analysis: Comprehensive coverage statistics and visualization
Multi-sample Processing: Batch processing with sample metadata management

Requirements

Software Dependencies

Nextflow (≥ 22.04.0)
Conda
Required tools (typically containerized):
- Guppy/Guppyplex
- ARTIC workflow tools
- ProovFrame
- Nextclade
- Pangolin
- Coverage analysis tools

System Requirements

Linux/macOS operating system
Minimum 8 GB RAM (16+ GB recommended)
50+ GB free disk space for intermediate files

Usage

Quick Start

nextflow run HUGTiP-SARS-COV-2.nf/main.nf \
   --runID 'run001' \
   --outDir 'path/to/outDir' \
   --workDir 'path/to/workDir' \ 
   --dataDir  'path/dataDir' \
   --metadata 'path/sample_sheet.csv' \
   -profile conda_on

Sample Metadata Format

Your metadata CSV file should contain the following headers:

sampleID,barcode
Sample001,barcode01
Sample002,barcode02
Sample003,barcode03

Workflow Steps

The pipeline executes the following main steps:

1. Data Preparation

Guppyplex: Filters and demultiplexes ONT sequencing data
Quality control and read filtering

2. Consensus Generation

ARTIC: Generates consensus sequences using the ARTIC workflow
Primer trimming and variant calling

3. Quality Correction

ProovFrame: Corrects frameshift mutations introduced during sequencing
Maintains reading frame integrity
Requires the translation of the reference genome

4. Sequence Analysis

Alignment: Aligns corrected consensus sequences with reference genome
Nextclade: Performs clade assignment and mutation analysis
Pangolin: Assigns PANGO lineages for epidemiological tracking

5. Coverage Analysis

Coverage: Calculates depth and breadth of coverage
Generates coverage plots and statistics
Produces summary coverage report (coverage_mean.csv)

Output Structure

results/
├── guppyplex/
│   └── [sample_id]/
│       └── filtered_reads.fastq
├── artic/
│   ├── consensus/
│   │   └── [sample_id].consensus.fasta
│   └── coverage/
│       └── [sample_id].coverage.txt
├── proovframe/
│   └── [sample_id]/
│       ├── corrected.fasta
│       └── corrections.tsv
├── alignment/
│   └── aligned_consensus.fasta
├── nextclade/
│   └── nextclade_results.tsv
├── pangolin/
│   └── pangolin_lineages.csv
├── coverage/
│   └── coverage_plots/
└── concatenated_consensus.fasta
└── coverage_mean.csv

Key Output Files

concatenated_consensus.fasta: All corrected consensus sequences in a single file
coverage_mean.csv: Summary coverage statistics for all samples
nextclade_results.tsv: Clade assignments and mutation profiles
pangolin_lineages.csv: PANGO lineage assignments

Configuration

Profile Configuration

Create a nextflow.config file for your environment:

profiles {
    docker {
        docker.enabled = true
        process {
            withName: 'guppyplex' {
                container = 'your-registry/guppyplex:latest'
            }
            withName: 'artic' {
                container = 'your-registry/artic:latest'
            }
            // Add other container configurations
        }
    }
    
    singularity {
        singularity.enabled = true
        // Singularity-specific configurations
    }
}

Resource Configuration

Adjust process resources based on your system:

process {
    withName: 'artic' {
        cpus = 4
        memory = '8 GB'
        time = '2h'
    }
    
    withName: 'coverage' {
        cpus = 2
        memory = '4 GB'
        time = '1h'
    }
}

Troubleshooting

Common Issues

Missing metadata file:
```
Error: Please provide a samplesheet XLSX file with --samplesheet
```
Solution: Ensure the --metadata parameter points to a valid CSV file.
Missing RunID:
```
Error: Please provide a RunID using --runID
```
Solution: Provide a numeric RunID with --runID 12345.
Data directory not found:
```
Error: Please provide full path to directory containing ONT results
```
Solution: Verify the --dataDir path exists and contains ONT data.

Citation

If you use this pipeline in your research, please cite:

Nextflow: Di Tommaso, P., et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319.
ARTIC: Quick, J., et al. (2017). Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nature Protocols, 12(6), 1261-1276. // https://artic.network/about
ProovFrame: Hackl, S., et al. ProovFrame: Correcting frameshift errors in viral genome assemblies.
Nextclade: Aksamentov, I., et al. (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773.
Pangolin: O'Toole, Á., et al. (2021). Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evolution, 7(2), veab064.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
bin		bin
envs/conda		envs/conda
modules		modules
png		png
schemes		schemes
CHANGELOG.md		CHANGELOG.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARS-CoV-2 Surveillance Pipeline

Overview

Features

Requirements

Software Dependencies

System Requirements

Usage

Quick Start

Sample Metadata Format

Workflow Steps

1. Data Preparation

2. Consensus Generation

3. Quality Correction

4. Sequence Analysis

5. Coverage Analysis

Output Structure

Key Output Files

Configuration

Profile Configuration

Resource Configuration

Troubleshooting

Common Issues

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SARS-CoV-2 Surveillance Pipeline

Overview

Features

Requirements

Software Dependencies

System Requirements

Usage

Quick Start

Sample Metadata Format

Workflow Steps

1. Data Preparation

2. Consensus Generation

3. Quality Correction

4. Sequence Analysis

5. Coverage Analysis

Output Structure

Key Output Files

Configuration

Profile Configuration

Resource Configuration

Troubleshooting

Common Issues

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages