Skip to content

Comparative Genomics Exercise 0: Genome assembly and annotation

Jordi edited this page Jan 24, 2024 · 6 revisions

Login into the GDAV server

$ ssh youruser@IP

Create a directory in your home folder called compgenomics_ex0

$ mkdir compgenomics_ex0

and enter the directory

$ cd compgenomics_ex0

The files needed for this exercise are copied in the GDAV server at /home/compgenomics/assembly/. Make sure you can see them and take a few seconds to understand what they contain.

Tools needed (already installed in the GDAV server)

  • Spades
  • Prokka
  • bwa

Exercise

Goal 1

Using the program spades, run a basic assembly and ORF prediction out of the two fastq files in /home/compgenomics/assembly.

Protocol

1. Run assembly

$ spades.py \
      -1 /home/compgenomics/assembly/SRR292770_1.fastq.gz \
      -2 /home/compgenomics/assembly/SRR292770_2.fastq.gz \
      -o my_assembly

2. Annotate ORFs and other elements with Prokka

Use prokka to get a list of predicted genes in your assembly.

$ conda activate roary  # we have a working prokka in the roary environment
$ prokka my_assembly/scaffolds.fasta

3. Map reads back to the assembly and genes

$ conda activate base  # we have bwa installed in the base environment
$ bwa index my_assembly/scaffolds.fasta
$ bwa mem my_assembly/scaffolds.fasta \
      /home/compgenomics/assembly/SRR292770_1.fastq.gz \
      /home/compgenomics/assembly/SRR292770_2.fastq.gz > SRR292770.sam

Clone this wiki locally