Bash scripting to generate genomic single nucleotide polymorphisms (SNPs).
This workflow was modified (1-5) to use short-read seqences from genomic DNA isolates of Phytophthora cinnamomi, a soil-bourne water mould, for SNP discovery and isolate comparisons. Bash scripts are used in numeric order to process these data as a workflow.
Linux or something that can run command line programs through a bash shell.
- SRA Toolkit (6)
- FastQC (7)
- MultiQC (8)
- Trimmomatic (9)
- SAMtools/BCFtools (10-11)
- BWA-MEM (12)
- BEDTools (13)
- Picard toolkit (14)
- ANGSD (15)
- Reference genome is available from JGI (16)
- Phytophtera resequenced genomes NCBI (17)
- Phytophtera SNP dataset GBS-Pcinnamomi
- Novel data for this study was produced at Genome Quebec and is coming soon to NCBI!
- Download published data
- QC reads (scripts 1 & 2)
- Trim reads (script 2 & NEBnext_dual.fasta)
- Repeat step 2
- Read mapping (scripts 4-11)
- Calling SNPs (scripts 12-17)
- Extracting specific SNPs (script 18)
- Poplin et al. 2017. Biorxiv, 201178.
- Van Der Auwera et al. 2013. Current Protocols in Bioinformatics 43 (1). https://doi.org/10.1002/0471250953.bi1110s43.
- Van der Auwera et al. 2020. O’Reilly Media. ISBN: 9781491975190.
- Fraser et al. 2020. Genome Biology and Evolution 12 (10):1789–805. https://doi.org/10.1093/gbe/evaa187.
- Moran et al. 2023. Nature Communications 14 (1): 2557. https://doi.org/10.1038/s41467-023-37909-8.
- SRA Toolkit Development Team. https://github.com/ncbi/sra-tools
- Babraham Bioinformatics. 2024. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- Ewels et al. 2016. Bioinformatics 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.
- Bolger et al. 2014. Bioinformatics 30 (15): 2114–20. https://doi.org/10.1093/bioinformatics/btu170.
- Danecek et al. 2021. GigaScience 10 (2):giab008. https://doi.org/10.1093/gigascience/giab008.
- Li et al. 2009. Bioinformatics 25 (16):2078–79. https://doi.org/10.1093/bioinformatics/btp352.
- Vasimuddin et al. 2019. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May, 314–24. https://doi.org/10.1109/IPDPS.2019.00041.
- Quinlan and Hall. 2010. Bioinformatics 26 (6): 841–42. https://doi.org/10.1093/bioinformatics/btq033.
- “Picard Toolkit.” 2019. Broad Institute. https://broadinstitute.github.io/picard/
- Korneliussen et al. 2014. BMC Bioinformatics 15 (1): 356. https://doi.org/10.1186/s12859-014-0356-4.
- Shakya et al. 2021. Molecular Ecology 30 (20): 5164–78. https://doi.org/10.1111/mec.16109.
- McDougal et al. 2025. Data in Brief 60 (June): 111655. https://doi.org/10.1016/j.dib.2025.111655.
Rhiannon Peery: rhiannon.peery@nrcan-rncan.gc.ca
The first available version of the workflow "Variant calling workflow for short-read genome resequencing" was developed by Natural Resources Canada and is licensed under CC BY-NC 4.0
© His Majesty the King in Right of Canada, as represented by the Minister of Natural Resources, 2026.
© Sa Majesté le Roi du Canada, représentée par le ministre des Ressources naturelles, 2026.