This code accompanies our Nature Ecology & Evolution paper and provides the computational tools for investigating properties of genotype-phenotype (GP) maps, specifically:
- Estimating GP map properties such as robustness and evolvability
- Estimate navigability of GP maps
- Simulate evolutionary dynamics
To install and compile the required tools, packages and binaries, follow the instructions in INSTALL.md.
Utilising GP map models anlaysed and published in 10.1371/journal.pcbi.1004773, we generate the following small GP maps and store on disk to facilitate faster computation:
S_{2,8}: Polyomino GP map with two tiles and eight coloursRNA12: RNA GP map with sequence length L=12HP5x5: Compact HP GP map on a 5x5 grid (L=25)HP3x3x3: Compact HP GP map on a 3x3x3 grid (L=27)HP20: Non-compact HP GP map of length L=20HP25: Non-compact HP GP map of length L=25
The GP maps are available after extracting:
tar -zxf gp_maps/gp_maps.tar.gz -C gp_maps
cd gp_mapsThe GP maps have two files:
geno_list*txt: linenis the phenotype of the the genotype that integernis when represented in baseKof the system.pheno_list*txt: the integerningeno_list*.txtcorresponds to the phenotype that is written onnth line OR for polyomino the filepheno*n*.txtwhich contains a diagrammatic representation of the polyomino.
The following numerical maps are used to convert character sequences to base K
integer sequences:
- S_{2,8} : {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7}
- RNA12 : {'A': 0, 'U': 1, 'C': 2, 'G': 3}
- HP5x5 : {'H': 0, 'P': 1}
- HP3x3x3 : {'H': 0, 'P': 1}
- HP20 : {'H': 1, 'P': 0}
- HP25 : {'H': 1, 'P': 0}
- fRNA : {'C': 0, 'G': 1, 'A': 2, 'U': 3}
For example, the phenotype on line 12557964 of gp_maps/RNA_12/geno_list0.txt
is phenptype index 56. This corresponds to phenotype on line 57 of
gp_maps/RNA_12/pheno_list0.txt which is ((.((...)))). The mapping can be
expressed as:
line of geno file = 12557964 -> genotype index = 12557963
genotype index = 12557963 -> base K=4 (reverse) = '320223123332'
base K=4 (reverse) = '320223123332' -> RNA sequence = 'GCACCGUCGGGC'
RNA sequence = 'GCACCGUCGGGC' -> RNA dot-bracket = '((.((...))))'
The above mapping can be applied to all GP maps on disk in the same fashion
using the corresponding char to int maps described above.
The S_{3,8}, RNA15 and fRNA GP maps are larger and the phenotype of a
given genotype is algorithmically estimated as it is not computationally
feasible to work with these larger GP maps on disk.
The functional RNA database (fRNAdb) is available to download.
We extracted all genotype sequences of length L with 20 <= L <= 140 from the
file frnadb_summary.csv, outputting each sequence of a given <LENGTH> to
fRNA/data/cs_l<LENGTH>.txt. Each sequence in fRNA/data/cs_l<LENGTH>.txt was
then folded using ViennaRNA 1.8.5, with the dot-bracket structures saved in the
same line by line order to fRNA/data/ps_l<LENGTH>.txt.
The properties of the GP maps on disk can be measured with the binary:
./bazel-bin/src/properties/gp_map_stats --helpFor example, with RNA12 and component level estimates:
./bazel-bin/src/properties/gp_map_stats \
--geno_fname gp_maps/RNA_12/geno_list0.txt \
--outpath scratch/ \
--base 4 \
--length 12 \
--nd_ref 0 \
--nd_include 0 \
--nd_threshold 1 \
--estimate_component_stats \
--verboseTo measure properties given random genotype swaps (e.g. 100,000):
./bazel-bin/src/properties/gp_map_stats \
--geno_fname gp_maps/RNA_12/geno_list0.txt \
--outpath scratch/ \
--base 4 \
--length 12 \
--nd_ref 0 \
--swaps 100000 \
--seed 1 \
--verbose
scripts/run_gp_map_stats.sh script to run the experiments for Table 1 and Fig. 2.
Navigability can be estimated with any GP map in gp_maps/ with:
./bazel-bin/src/landscape/nav_disk --helpFor example, with RNA12:
./bazel-bin/src/landscape/nav_disk \
--geno_fname gp_maps/RNA_12/geno_list0.txt \
--outpath scratch/ \
--base 4 \
--length 12 \
--targets 50 \
--sources 20 \
--nd_ref 0 \
--seed 1 \
--dimension 4 \
--swaps 100000 \
--verboseWe estimate the navigability for larger GP maps that are not stored on disk by
algorithmically folding/assembling. These are available for S_{3,8} and
RNA15 here:
./bazel-bin/src/landscape/nav_alg --helpE.g. for S_{3,8}:
./bazel-bin/src/landscape/nav_alg \
--base 8 \
--length 12 \
--file_name s38 \
--nd_ref 1 \
--neutral_mutations \
--gp_map polyomino \
--pheno_fname gp_maps/s_3_8/phenotypes/ \
--assembly_tests 20 \
--seed 1 \
--sources $sources \
--targets $targets \
--threshold $threshold \
--outpath scratch/scripts/run_nav_swaps_and_dim.sh provides a bash script to perform the experiments exploring how navigability changes under removing neutral correlations and restricting dimensionality is provided.
A binary for performing the navigability estimation in fRNA is available running:
./bazel-bin/src/landscape/nav_rna_dfs --helpComplete bash scripts for performing all fRNA experiments are available at scripts/run_nav_rna_dfs.sh. For example,
./scripts/run_nav_rna_dfs.sh 20000 hamming outswill perform navigability estimation on fRNA at all lengths with and without
neutral mutations with a threshold of 20000 total genotypes considered with
Hamming fitness function applied.
A binary to run monormorphic evolutionary dynamics on RNA is available with:
./bazel-bin/src/evolution/nav_rna_mono --helpAn example run is given below:
./bazel-bin/src/evolution/nav_rna_mono \
--length 40 \
--outpath scratch/mono/ \
--fitness_measure hamming \
--threshold 10000 \
--sources 1 \
--targets 1 \
--pheno_fname "frna/data/ps_l40.txt" \
--population_size 10000 \
--norecord_diffs_only \
--seed 1 \
--neutral_mutations \
--shape_level 0A complete script for running the experiments is available at scripts/run_all_mono.sh which utilises a python subscript scripts/run_evolution_mono.py.
A binary to run monormorphic evolutionary dynamics on RNA is available with:
./bazel-bin/src/evolution/nav_rna --helpAn example run is given below:
./bazel-bin/src/evolution/nav_rna \
--length 40 \
--outpath scratch/poly/ \
--generations 10000 \
--sources 1 \
--targets 1 \
--pheno_fname "frna/data/ps_l40.txt" \
--population_size 100 \
--seed 1A complete script for running the experiments is available at scripts/run_all_poly.sh which utilises a python subscript scripts/run_evolution_polymorphic.py.
The outputs of the evolutionary dynamics are in the form of sequential mutants (monomorphic case) or population summary statistics (polymorphic case). To derive estimates of the navigability additional analysis must be run on the output files to measure navigability in each source-target pair.
Scripts to produce these outputs across batches of paths containing evolutionary ouputs can be run with, for example:
python scripts/evo_analysis/run_analysis_script_batch.py\
--paths \
data/raw_output/mono \
data/raw_output/poly \
--executable scripts/evo_analysis/run_analysis_script.py \
--by allcalling the subscripts
scripts/evo_analysis/run_analysis_script.py
and
scripts/evo_analysis/analysis_script.py.
with the outputs written within each of --paths to subpath analysis/.
The plotting notebooks and scripts used to generate the figures and tables are available in plotting.