-
Notifications
You must be signed in to change notification settings - Fork 74
Whole Genome Sequence Analysis
This course will be both theoretical and hands-on. We will learn the main tools used to do the alignment, variant calling, annotation, and visualization. We will start with raw FASTQ reads and get to annotated variants (VCF files).
This is an intermediate workshop in the Computational Genomics series. Prior experience with command-line programming is required. See introductory workshop:
In preparation for the WGS workshop, we have created a Virtual Machine (a mini-Linux machine that will work inside your own machine, regardless if it's Mac or Windows) with some sequence data, bioinformatics tools, and databases using VirtualBox.
So, if you don't have access to a machine with these Bioinformatic tools installed, please follow the steps to download the virtual machine.
If you want to set up your own local environment: Download and install the following tools:
- bwa - http://bio-bwa.sourceforge.net/bwa.shtml
- Picard - https://broadinstitute.github.io/picard/command-line-overview.html#Overview
- Samtools - http://www.htslib.org/doc/samtools.html
- GATK - https://gatk.broadinstitute.org/
- SnpEff - http://snpeff.sourceforge.net/SnpEff_manual.html
- SnpSift - http://snpeff.sourceforge.net/SnpSift.html
- IGV - https://software.broadinstitute.org/software/igv/download
Download and prepare chromosome 19 (hg38)
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr19.fa.gz
bwa index chr19.fa.gz
Now you are good to go!
If you prefer to download the virtual machine with all the data and tools set up, please follow these steps (make sure you have at least 10GB of free space):
-
Download VirtualBox (https://www.virtualbox.org/wiki/Downloads) for your operating system (Windows hosts or OS X hosts)
-
Download the OVA file ("WGS_workshop_VM.ova"), which is the image of a Linux machine (CentOS) with the tools and data we are going to use: (https://www.dropbox.com/s/yrlt4ggfgymvo4t/WGS_workshop_VM.ova?dl=0)
-
Download IGV (https://software.broadinstitute.org/software/igv/download), this is for visualizing the alignment data
-
[Windows users only] Download and install Putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
-
Install VirtualBox following the instruction prompted. Please note: the Mac machine may ask you for security permission. You will be able to install VirtualBox only after you grant permission.
-
Load the "WGS_workshop_VM.ova" file that contains everything you need for the course:
Open VirtualBox, click "File" -> "Import Appliance", select the file "WGS_workshop_VM.ova" and click "Import".
- Start the virtual machine:
Start it headless:
-
Right-click on the virtual machine ("WGS_workshop_VW" on the left of the menu as shown above)
-
"Start"
-
"Headless Start"
-
Then use your terminal (on Mac) or Putty (on Windows) to connect to the virtual machine via ssh.
-
On Mac:
Open your terminal (under Applications -> Utilities) and type
ssh student@localhost -p 2222
(the password is workshop)
- On Windows (figure below):
Open Putty. The Host Name is localhost and the Port is 2222. Click Open.
Login as: student (the password is workshop)
If that works, try to run the following command:
ls -l /root/data
You should see a list of fastq files for 3 different patients.
Now you only have to download IGV (https://software.broadinstitute.org/software/igv/download).
You can access these materials remotely at any time and go through them at your own pace.