-
Notifications
You must be signed in to change notification settings - Fork 19
Comparative Genomics Exercise 1: Orthology inference using BLAST
Login into the GDAV server
$ ssh youruser@IPCreate a directory in your home folder called compgenomics_ex1
$ mkdir compgenomics_ex1and enter the directory
$ cd compgenomics_ex1All the files needed for this exercise are copied in the GDAV server
at /home/compgenomics/4proteomes/. Make sure you can see them and
take a few seconds to understand what they contain:
$ ls /home/compgenomics/4proteomes/They are:
4proteomes.faa -> all protein sequences from 4 species: Human, Elefant, Zebrafish and Ciona intestinalis
G3T0S8_LOXAF.faa -> protein sequence of the elefant gene called G3T0G8
TPH1A_rerio.faa -> protein sequence of the Zebrafish gene called TPH1A
TPH2_human.faa -> protein sequence of Human TPH2
scripts/ -> a directory with ad-hoc programs and scripts
- BLAST+
Using just BLAST reciprocal searches, could you identify which is the
human ortholog of the Loxodonta protein G3T0S8? (Remember that the
protein sequence of G3T0S8 is available in
/home/compgenomics/4proteomes/G3T0S8_LOXAF.faa.)
Make a blast database containing the 4 input proteomes (Human, Danio
rerio, Ciona intestinalis, and Loxodonta africana) and name the
database as 4proteomes.blastdb. All proteomes are already merged
into a single FASTA file
/home/compgenomics/4proteomes/4proteomes.faa
$ makeblastdb \
-dbtype prot \
-in /home/compgenomics/4proteomes/4proteomes.faa \
-out 4proteomes.blastdbYou should now see something like this in your exercise home folder:
$ ls -l
total 48604
-rw-rw-r--. 1 test test 7434160 oct 15 15:58 4proteomes.blastdb.phr
-rw-rw-r--. 1 test test 683920 oct 15 15:58 4proteomes.blastdb.pin
-rw-rw-r--. 1 test test 41649539 oct 15 15:58 4proteomes.blastdb.psqUse the blastp command to search for all homologs of the G3T0S8
sequence. Use an evalue threshold of 0.001.
$ blastp \
-task blastp \
-query /home/compgenomics/4proteomes/G3T0S8_LOXAF.faa \
-db 4proteomes.blastdb \
-outfmt 6 \
-evalue 0.001- How many homologs of G3T0S8 are in human?
- Which is the closest one?
- Are they orthologs?
Extract the sequence of the closest homolog of G3T0S8_LOXAF in human
and save it a new file called G3T0S8_best_human_hit.faa.
There are many different ways to do this. You can open the FASTA file containing all proteomes, search by the sequence name and extract it manually, or you can (as you should be able to) do it from the command line.
$ grep -A 1 \
[HUMAN_homolog_in_blast_result] \
/home/compgenomics/4proteomes/4proteomes.faa > G3T0S8_best_human_hit.faa$ blastp \
-task blastp \
-query [HUMAN_seq_file] \
-db 4proteomes.blastdb \
-outfmt 6 \
-evalue 0.001- Are they reciprocal hits?
- Are they orthologous with each other?
Repeat the previous protocol with the Zebrafish sequence found in the
file TPH1A_rerio.faa (Danio rerio homolog).
- What are the Zebrafish homologs in human?
- Are they the same as in the Loxodonta example?
- What is the ortholog in human (based on reciprocal blast)?
- What could you tell about the gene TPH1B in Danio rerio?