Skip to content

NRCan/callSNPs_genomeResequencing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bash scripting to generate genomic single nucleotide polymorphisms (SNPs).

Overview

This workflow was modified (1-5) to use short-read seqences from genomic DNA isolates of Phytophthora cinnamomi, a soil-bourne water mould, for SNP discovery and isolate comparisons. Bash scripts are used in numeric order to process these data as a workflow.

Requirements

Environment

Linux or something that can run command line programs through a bash shell.

Software

  • SRA Toolkit (6)
  • FastQC (7)
  • MultiQC (8)
  • Trimmomatic (9)
  • SAMtools/BCFtools (10-11)
  • BWA-MEM (12)
  • BEDTools (13)
  • Picard toolkit (14)
  • ANGSD (15)

Published data

  • Reference genome is available from JGI (16)
  • Phytophtera resequenced genomes NCBI (17)
  • Phytophtera SNP dataset GBS-Pcinnamomi
  • Novel data for this study was produced at Genome Quebec and is coming soon to NCBI!

Workflow overview

  1. Download published data
  2. QC reads (scripts 1 & 2)
  3. Trim reads (script 2 & NEBnext_dual.fasta)
  4. Repeat step 2
  5. Read mapping (scripts 4-11)
  6. Calling SNPs (scripts 12-17)
  7. Extracting specific SNPs (script 18)

References

  1. Poplin et al. 2017. Biorxiv, 201178.
  2. Van Der Auwera et al. 2013. Current Protocols in Bioinformatics 43 (1). https://doi.org/10.1002/0471250953.bi1110s43.
  3. Van der Auwera et al. 2020. O’Reilly Media. ISBN: 9781491975190.
  4. Fraser et al. 2020. Genome Biology and Evolution 12 (10):1789–805. https://doi.org/10.1093/gbe/evaa187.
  5. Moran et al. 2023. Nature Communications 14 (1): 2557. https://doi.org/10.1038/s41467-023-37909-8.
  6. SRA Toolkit Development Team. https://github.com/ncbi/sra-tools
  7. Babraham Bioinformatics. 2024. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  8. Ewels et al. 2016. Bioinformatics 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.
  9. Bolger et al. 2014. Bioinformatics 30 (15): 2114–20. https://doi.org/10.1093/bioinformatics/btu170.
  10. Danecek et al. 2021. GigaScience 10 (2):giab008. https://doi.org/10.1093/gigascience/giab008.
  11. Li et al. 2009. Bioinformatics 25 (16):2078–79. https://doi.org/10.1093/bioinformatics/btp352.
  12. Vasimuddin et al. 2019. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May, 314–24. https://doi.org/10.1109/IPDPS.2019.00041.
  13. Quinlan and Hall. 2010. Bioinformatics 26 (6): 841–42. https://doi.org/10.1093/bioinformatics/btq033.
  14. “Picard Toolkit.” 2019. Broad Institute. https://broadinstitute.github.io/picard/
  15. Korneliussen et al. 2014. BMC Bioinformatics 15 (1): 356. https://doi.org/10.1186/s12859-014-0356-4.
  16. Shakya et al. 2021. Molecular Ecology 30 (20): 5164–78. https://doi.org/10.1111/mec.16109.
  17. McDougal et al. 2025. Data in Brief 60 (June): 111655. https://doi.org/10.1016/j.dib.2025.111655.

Contact

Rhiannon Peery: rhiannon.peery@nrcan-rncan.gc.ca

License

The first available version of the workflow "Variant calling workflow for short-read genome resequencing" was developed by Natural Resources Canada and is licensed under CC BY-NC 4.0

© His Majesty the King in Right of Canada, as represented by the Minister of Natural Resources, 2026.
© Sa Majesté le Roi du Canada, représentée par le ministre des Ressources naturelles, 2026.

About

Calling SNPs from genome resequencing short-read sequence data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages