The process is partially adapted from FluViewer tool for genotype HCV amplicon sequencing data. Samples are only amplied in the core (361-764) and ns5b region (8803-9191). This is a purely assembly-based approach. The assembly is done using SPades. Top 10 genotypes are produced when blast (blastn) the contigs to the database.
graph TD
A[Input FASTA sequences] --> B[Sequence quality control]
B --> C[Assembly]
C --> D[BLAST search]
D --> DA[HCV database]
D --> DB[Core NT database]
DA --> E[Build consensus sequences]
E --> F[Map reads to consensus]
F --> G[Qualimap]
E --> J[Build tree]
G --> H[Collect Reports]
When using nextflow pipeline, specify the environment by adding -profile conda --cache ~/.conda/envs
nextflow run BCCDC-PHL/hcv_nf \
--fastq_input <path/to/fastq/dirs> \
--db <path/to/ref/db> \
--ref_core <path/to/ref_core/db> \
--ref_ns5b <path/to/ref_ns5b/db> \
--nt_dir </path/to/blast_nt_db_dir> \
--outdir <path/to/output_dir> \
The required inputs are:
- fastq input directory.
- path to the full length HCV reference database
- path to reference database that have core side extraced
- path to reference database that have ns5b side extraced
- path to directory containing the BLAST
core_ntdatabase - outdir directory to store the results
| outputs | description |
|---|---|
| run_summary_report.csv | the combined summary for consensus report, genotype, qc stats, demixming results and check column |
| consensus_seqs.fa | consensus sequences for core and/or ns5b |
| genotype_calls.csv | blastn results after blast the consensus sequences to the core_nt database, some columns are in the run_summary_report.csv |
| demix.csv | proportions of different subtypes present in the sample, are also in the run_summary_report.csv |
| parsed_genome_results.csv | qc stats for mean coverage, total mapped reads, median coverage, depth, percent completeness at different depth. also in the run_summary_report |
| mapped_to_db.bam | mapping raw reads to all references in the database |
| mapped_to_ref.bam | mapping raw reads to the assembly |
| RAxML_bestTree.1Ao4_core | Tree with sample of interests and the core references |
| RAxML_bestTree.1Ao4_ns5b | Tree with sample of interests and the ns5b references |