Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates
This repository contains all the scripts and data to reproduce the results of:
D. K. Sydykova, C. O. Wilke (2017). Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates. PeerJ 5:e3391. https://doi.org/10.7717/peerj.3391
mech_codon contains results for the alignments simulated with the dN/dS model.
-
assigned_ratescontains true site-wise dN/dS. -
filtered_sitescontains information on all sites without any amino acid substitutions for each simulated alignment. -
inferred_ratescontains inferred site-wise dN/dS. -
processed_ratescontains tables with all site-wise rates: true dN/dS, inferred dN/dS, and inferred Rate4Site. -
r4s_ratescontains inferred site-wise Rate4site rates.
mut_sel contains results for the alignments simulated with the mutation-selection (MutSel) model. MutSel alignments were simulated by Spielman et al. (2016). True site-wise and inferred dN/dS for their alignments can be found in their repository https://github.com/sjspielman/dnds_1rate_2rate
-
filtered_sitescontains information on all sites without any amino acid substitutions for each simulated alignment. -
processed_ratescontains tables with all site-wise rates: true dN/dS, inferred dN/dS, and inferred Rate4Site. -
r4s_ratescontains inferred site-wise Rate4site rates.
natural_prot contains results for the natural alignments from Spielman and Wilke (2013) and Meyer and Wilke (2015). The data we used can be found at https://github.com/sjspielman/mammalian_gpcr_selection and https://github.com/ausmeyer/hiv_structural_determinants, respectively.
-
alncontains HIV-1 and GPCR protein sequences used in our analysis-
aligned_seqscontains amino acid sequences we aligned. -
back_translated_alncodon alignments that were translated back from amino acid alignments. -
raw_alncontains raw FASTA files from the repositories mentioned. -
reforematted_alncontains nucleotide alignments with sequence IDs reformatted. These were used as input forHyPhy.
-
-
filtered_sitescontains information on all sites without any amino acid substitutions for each alignment. -
inferred_dNdScontains site-wise inferred dN/dS. -
processed_ratescontains tables with site-wise inferred dN/dS and inferred Rate4Site. -
r4s_ratescontains inferred site-wise Rate4site rates. -
treescontains trees inferred from amino acid alignment for each protein. This directory also contains trees with reformatted sequence IDs to be used as input forHyPhy.
plots contains final figures used in the publication.
src contains all of the scripts used to analyze the data and plot the figures. The usage of each script is described in the section below.
The analysis in this section requires https://github.com/sjspielman/dnds_1rate_2rate in the same directory as the current repository.
-
Copy trees from https://github.com/sjspielman/dnds_1rate_2rate using the command line
cp ../dnds_1rate_2rate/trees/n*_bl*.tre ./trees/. -
Simulate alignments using
./src/write_run_sim_aln.sh. This script will writerun_sim_aln.shto simulate dN/dS alignments. -
Translate simulated nucleotide alignments to amino acids using
./src/write_run_translate_aln.sh. -
Infer site-wise dN/dS with
HyPhyusing the script./src/dnds_inference/submit_run_inference.sh. This script was copied from https://github.com/sjspielman/dnds_1rate_2rate and modified for this analysis. -
Infer site-wise Rate4Site scores using
./src/write_run_r4s_mech_codon.sh. This script will writerun_r4s_mech_codon.shwhich usesr4s_pipeline.shto run Rate4Site on simulated alignments. -
Concatenate all rates into a table with
./src/concatenate_mech_codon_rates.r.
The analysis in this section requires https://github.com/sjspielman/dnds_1rate_2rate in the same directory as the current repository.
-
Translate simulated nucleotide alignments from Spielman et al. (2016) to amino acids using
./src/write_run_translate_aln.sh. -
Infer site-wise Rate4Site scores using
./src/write_run_r4s_mut_sel.sh. This script will writerun_r4s_mut_sel.shwhich usesr4s_pipeline.shto run Rate4Site on simulated alignments. -
Concatenate all rates into a table with
./src/concatenate_mut_sel_rates.r.
-
Align amino acid sequences using
./src/write_run_align_natural_prot.sh. -
Back translate amino acid alignments into codon alignments with
./src/run_translate_aln_aa_to_codon.sh. This script requires original nucleotide sequences. -
Infer trees from the amino acid sequences with RAxML. The script
./src/write_run_raxml.shwill writerun_raxml.shwhich will run the inference. -
Infer site-wise dN/dS with
HyPhyusing the script./src/dnds_inference/submit_run_inference_nat_prot.sh. This script was copied from https://github.com/sjspielman/dnds_1rate_2rate and modified for this analysis. -
Infer site-wise Rate4Site scores using
./src/write_run_r4s_natural_prot.sh. This script will writerun_r4s_natural_prot.shwhich will run Rate4Site on natural alignments. -
Concatenate all rates into a table with
./src/concatenate_natural_prot_rates.r.