Neural network extrapolation to distant regions of the protein fitness landscape

Installation

The code in this module is built on the nn4dms machine learning models and module. This module has the same requirements as nn4dms, with a few additional dependencies. Use conda to set up an environment from env.yml

The model code is in the linked repo nn4dms. This code is required to reproduce some of our inference results. Code and models to generate model predictions are included as the nn4dms_nn-extrapolate andnn-extrapolate-models submodules. Download times may be long if cloning submodules. We include download commands for nn-extrapolation with and without submodules.

# clone repo without submodules
gh repo clone RomeroLab/nn-extrapolation

# checkout submodules individually
git submodule update nn4dms_nn-extrapolate
git submodule update nn-extrapolation-models

# clone full repo including submodules
gh repo clone RomeroLab/nn-extrapolation -- --recurse-submodules

We provide two conda environments to reproduce our analysis. gb1_inf is used to run all Makefile commands. gb1_notebook is used to reproduce data anlysis in the Jupyter notebooks.

# download and install environments
conda env create -f env_inference.yml
conda env create -f env_notebook.yml

# activate conda environments
conda activate gb1_inf
conda activate gb1_notebook

Installation takes ~5 minutes.

Software Requirements

This module has been tested on the following systems:

macOS: Mojave (10.14.6)
Linux: Green Obsidian (8.8)

Data Analysis

Extrapolating learned protein fitness lanscapes

Generate predictions for the Wu et al. 1-4 mutant GB1 fitness dataset.

python 01_extrapolation_predictions.py

Generate extrapolation trajectories.

python 01_extrapolation_trajectories.py

Generate plots for Fig 1 and Fig S1 in 01_extrapolation_analysis.ipynb

ML-guided protein design for deep exploration of the fitness landscape

Design sequences using 02_run_sa.py. See example below. Each design can take minutes to hours, depending on the model; this can be accelerated by running on a GPU.

python 02_run_sa.py data/config_example.txt

Generate plots for Fig 2 and Fig S2-3 in 02_designs_analysis.ipynb

Large-scale experimental characterization of ML designed GB1 variants

From the raw fastq files, preprocess the results and determine counts for each variant in the library. (fastq files will be downloadable from the SRA; save the fastq files in a directory fastq_files). This step can take hours to days.

mkdir merged_reads
for d in fastq_files/ ; do
python 03_preprocessing.py fastq_files/${d} ${d:0:6} merged_reads/ designs.csv designs_counts.csv

Generate plots for Fig 3 and Fig S4-8 in 03_design_experimental_analysis.ipynb

ML designed GB1s show improved display and IgG binding

Generate plots for Fig 4 and Fig S9 in 03_design_experimental_analysis.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
fastq_files		fastq_files
gen_data		gen_data
nn-extrapolation-models @ 545a3e4		nn-extrapolation-models @ 545a3e4
nn4dms_nn-extrapolate @ d8e0659		nn4dms_nn-extrapolate @ d8e0659
plots		plots
source_data_files		source_data_files
.gitignore		.gitignore
.gitmodules		.gitmodules
01_extrapolation_analysis.ipynb		01_extrapolation_analysis.ipynb
01_extrapolation_predictions.py		01_extrapolation_predictions.py
01_extrapolation_trajectories.py		01_extrapolation_trajectories.py
02_designs_analysis.ipynb		02_designs_analysis.ipynb
02_run_sa.py		02_run_sa.py
03_design_experimental_analysis.ipynb		03_design_experimental_analysis.ipynb
03_preprocessing.py		03_preprocessing.py
04_tmalign.ipynb		04_tmalign.ipynb
05_ysd_analysis.ipynb		05_ysd_analysis.ipynb
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
calibration_seqs.csv		calibration_seqs.csv
designs.csv		designs.csv
designs_counts.csv		designs_counts.csv
designs_scores.csv		designs_scores.csv
env_inference.yml		env_inference.yml
env_inference_v2.yml		env_inference_v2.yml
env_notebook.yml		env_notebook.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural network extrapolation to distant regions of the protein fitness landscape

Installation

Software Requirements

Data Analysis

Extrapolating learned protein fitness lanscapes

ML-guided protein design for deep exploration of the fitness landscape

Large-scale experimental characterization of ML designed GB1 variants

ML designed GB1s show improved display and IgG binding

About

Uh oh!

Releases 4

Packages

Contributors 2

Uh oh!

Languages

License

RomeroLab/nn-extrapolation

Folders and files

Latest commit

History

Repository files navigation

Neural network extrapolation to distant regions of the protein fitness landscape

Installation

Software Requirements

Data Analysis

Extrapolating learned protein fitness lanscapes

ML-guided protein design for deep exploration of the fitness landscape

Large-scale experimental characterization of ML designed GB1 variants

ML designed GB1s show improved display and IgG binding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Uh oh!

Languages

Packages