Skip to content

Whole Genome Sequence Analysis

Alexander Pico edited this page Nov 17, 2023 · 15 revisions

Description

This course will be both theoretical and hands-on. We will learn the main tools used to do the alignment, variant calling, annotation, and visualization. We will start with raw FASTQ reads and get to annotated variants (VCF files).

Learning Path

Intermediate   This is an intermediate workshop in the Computational Genomics series. Prior experience with command-line programming is required. See introductory workshop:

Materials

Pre-workshop Instructions

In preparation for the WGS workshop, we have created a Virtual Machine (a mini-Linux machine that will work inside your own machine, regardless if it's Mac or Windows) with some sequence data, bioinformatics tools, and databases using VirtualBox.

So, if you don't have access to a machine with these Bioinformatic tools installed, please follow the steps to download the virtual machine.

If you want to set up your own local environment: Download and install the following tools:

Download and prepare chromosome 19 (hg38)

wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr19.fa.gz
bwa index chr19.fa.gz

Now you are good to go!

If you prefer to download the virtual machine with all the data and tools set up, please follow these steps (make sure you have at least 10GB of free space):

  1. Download VirtualBox (https://www.virtualbox.org/wiki/Downloads) for your operating system (Windows hosts or OS X hosts)

  2. Download the OVA file ("WGS_workshop_VM.ova"), which is the image of a Linux machine (CentOS) with the tools and data we are going to use: (https://www.dropbox.com/s/yrlt4ggfgymvo4t/WGS_workshop_VM.ova?dl=0)

  3. Download IGV (https://software.broadinstitute.org/software/igv/download), this is for visualizing the alignment data

  4. [Windows users only] Download and install Putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

  5. Install VirtualBox following the instruction prompted. Please note: the Mac machine may ask you for security permission. You will be able to install VirtualBox only after you grant permission.

  6. Load the "WGS_workshop_VM.ova" file that contains everything you need for the course:

Open VirtualBox, click "File" -> "Import Appliance", select the file "WGS_workshop_VM.ova" and click "Import".

  1. Start the virtual machine:

Virtual Box

Start it headless:

  • Right-click on the virtual machine ("WGS_workshop_VW" on the left of the menu as shown above)

  • "Start"

  • "Headless Start"

  • Then use your terminal (on Mac) or Putty (on Windows) to connect to the virtual machine via ssh.

  • On Mac:

Open your terminal (under Applications -> Utilities) and type

ssh student@localhost -p 2222

(the password is workshop)

  • On Windows (figure below):

Open Putty. The Host Name is localhost and the Port is 2222. Click Open.

Putty

Login as: student (the password is workshop)

If that works, try to run the following command:

ls -l /root/data

You should see a list of fastq files for 3 different patients.

Now you only have to download IGV (https://software.broadinstitute.org/software/igv/download).

Online Learning

You can access these materials remotely at any time and go through them at your own pace.

[Commands] [Slides] (previous workshop)