Skip to content

BCCDC-PHL/tick-COI-amplicon-illumina-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tick-COI-amplicon-illumina-nf

This bioinformatics pipeline is based on the BCCDC-PHL/ncov2019-artic-nf pipeline (https://github.com/BCCDC-PHL/ncov2019-artic-nf), with modifications specific to the assembly methods (in this case, de novo assembly). This pipeline has been adapted to support analysis of tick identification, specifically Ixodes and Dermacentor, by cytochrome oxidase subunit I (COI) targeted sequencing (amplicon size: 658 bp).

The bioinformatics pipeline builds a consensus sequence using de novo assembler SPAdes from Illumina paired-end reads. This COI sequence is then used to blast against an in-house curated Ixodes/Dermacentor database (v1.0). Note: Prior to assembly, there are additional steps that remove any human reads (using a mapping of the reads to a human reference genome), followed by removal of bacterial reads (using Kraken2 and KrakenTools). Any consensus sequences with poor matches (<97% sequence identity) to our in-house database are blast against NCBI database.

flowchart TD
  composite_ref[composite_ref.fa]
  fastq[fastq_dir]
  fastq --> performHostFilter
  composite_ref --> performHostFilter
  performHostFilter(performHostFilter) --> readTrimming(readTrimming)
  readTrimming(readTrimming) --> filterResidualAdapters(filterResidualAdapters)
  filterResidualAdapters --> kraken2Reports
  ch_kraken_db --> kraken2Reports
  kraken2Reports --> removeBacterialReads
  removeBacterialReads --> subsample
  subsample --> denovoAssembly
  subsample --> readMapping
  indexReferences --> readMapping
  alignConsensusToReference --> readMapping
  readMapping --> makeQCCSV
  ch_refFasta_primers --> makeQCCSV
  ch_bedFile --> makeQCCSV
  ch_primerPairs --> makeQCCSV
  compileTopHits --> makeQCCSV
  denovoAssembly --> alignConsensusToReference
  ch_refFasta --> alignConsensusToReference
  alignConsensusToReference --> blastSpeciesID
  ch_blastndb --> blastSpeciesID
  blastSpeciesID --> compileTopHits
  compileTopHits --> writeTopHitsCSV
  makeQCCSV --> writeQCSummaryCSV
  writeQCSummaryCSV --> extractSampleIDs
  extractSampleIDs --> ncbiBlast
  ncbi_db --> ncbiBlast
  ncbiBlast --> reportNCBI
Loading

Usage

  Usage:
    nextflow run BCCDC-PHL/tick-COI-amplicon-illumina-nf -profile conda --cached ~/.conda/envs --prefix [prefix] [workflow-options]

  Description:
    Turn Illumina sequencing reads into consensus sequences

  Nextflow arguments (single DASH):
    -profile                      Allowed values: conda

  Workflow options:
    Mandatory:
      --prefix                    A (unique) string prefix for output files.
                                  Sequencing run name is a good choice e.g DDMMYY_MACHINEID_RUN_FLOWCELLID.
      --directory                 Path to a directory containing paired-end Illumina reads.
                                  Reads will be found and paired RECURSIVELY beneath this directory.
      --db                        Path to blastn database (e.g., in-house tick COI database or supply your specific database)
      --kraken_db                 Path to kraken2 database
      --ncbi_db                   Path to ncbi database
      --outdir                    Output directory (Default: ./results)
      --bed                       Path to primer bed file, also requires --ref-with-primers
      --ref                       Path to reference fasta file (without primers)
      --ref_with_primers          Path to reference fasta file (with primers), also requires --bed
      --primer_pairs_tsv          File showing which primers are paired.
      --composite_ref             Human_and_composite_ref sequence
    Optional:
      --length                    Threshold for keeping contigs from denovoAssembly (default:600)
      --subsampleReads            Number of reads to downsample (default: 1000)
      --max_target_seqs_blast     Number of maximum target sequences as output using blast (default: 100)

About

Tick cytochrome oxidase subunit I (COI) amplicon sequencing

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors