Skip to content

BCCDC-PHL/hcv-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hcv-nf

The process is partially adapted from FluViewer tool for genotype HCV amplicon sequencing data. Samples are only amplied in the core (361-764) and ns5b region (8803-9191). This is a purely assembly-based approach. The assembly is done using SPades. Top 10 genotypes are produced when blast (blastn) the contigs to the database.

graph TD

A[Input FASTA sequences] --> B[Sequence quality control]
B --> C[Assembly]
C --> D[BLAST search]
D --> DA[HCV database]
D --> DB[Core NT database]
DA --> E[Build consensus sequences]
E --> F[Map reads to consensus]
F --> G[Qualimap]
E --> J[Build tree]
G --> H[Collect Reports]
Loading

Usage

When using nextflow pipeline, specify the environment by adding -profile conda --cache ~/.conda/envs

nextflow run BCCDC-PHL/hcv_nf \
  --fastq_input <path/to/fastq/dirs> \
  --db <path/to/ref/db> \
  --ref_core <path/to/ref_core/db> \
  --ref_ns5b <path/to/ref_ns5b/db> \
  --nt_dir </path/to/blast_nt_db_dir> \
  --outdir <path/to/output_dir> \ 

Input

The required inputs are:

  • fastq input directory.
  • path to the full length HCV reference database
  • path to reference database that have core side extraced
  • path to reference database that have ns5b side extraced
  • path to directory containing the BLAST core_nt database
  • outdir directory to store the results

Output

outputs description
run_summary_report.csv the combined summary for consensus report, genotype, qc stats, demixming results and check column
consensus_seqs.fa consensus sequences for core and/or ns5b
genotype_calls.csv blastn results after blast the consensus sequences to the core_nt database, some columns are in the run_summary_report.csv
demix.csv proportions of different subtypes present in the sample, are also in the run_summary_report.csv
parsed_genome_results.csv qc stats for mean coverage, total mapped reads, median coverage, depth, percent completeness at different depth. also in the run_summary_report
mapped_to_db.bam mapping raw reads to all references in the database
mapped_to_ref.bam mapping raw reads to the assembly
RAxML_bestTree.1Ao4_core Tree with sample of interests and the core references
RAxML_bestTree.1Ao4_ns5b Tree with sample of interests and the ns5b references

About

Genomic analysis pipeline for genotyping of Hepatitis C Virus

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors