Associated with the paper:
A resource of RNA-binding protein motifs across eukaryotes reveals evolutionary dynamics and gene-regulatory function
Please use the actively maintained implementation of JPLE here:
https://github.com/morrislab/jple
If you want to use JPLE to train your own model for predicting sequence specificity profiles from protein sequence, or to infer RNA sequence specificity for RRM- or KH-domain RNA-binding proteins, please refer to: https://github.com/morrislab/jple
All the measured and inferred RNA sequence specificities can be found at:
cisbp.org
This repository serves as an archival collection of scripts and command-line tools used for analyses during the PhD thesis:
Inferring RNA Sequence Specificities from Protein Sequences to Characterize Post-Transcriptional Regulation in Eukaryotes
Maintenance is limited, but the repository contains the scripts used to reproduce figures and results presented in the thesis.
The majority of scripts will run on a virtual environment with python 2.7 adding dependencies listed in dependencies.txt
Some scripts may require python3 or other dependencies. Please install as required:
- python3 (agglomerative_clustering.py), scikit-learn==0.23.2)
- Hmmer (http://hmmer.org/)
- conservation_code (Capra JA and Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics, 23(15):1875-82, 2007. (https://compbio.cs.princeton.edu/conservation/))
- pymol (https://pymol.org/2/) to visualize individual pdbs
To reconstruct the figures (fig1-5.sh), first run the preprocessing scripts in the following order:
rncmpt_data.shfig1.shperformance_calc.shinterface_importance.shfig2.shjple_reconstruction.shfig3.shcisbp-recstats.sharabidopsis.shfig4.shfig5.sh
The repository is licensed under the BSD 3-Clause License. See LICENSE for details.
If you use this code, please cite the following:
-
Primary paper:
Sasse A., Ray D., Laverty, K.U. et al. A resource of RNA-binding protein motifs across eukaryotes reveals evolutionary dynamics and gene-regulatory function. Nature Biotechnology (2025). https://doi.org/10.1038/s41587-025-02733-6
-
PhD Thesis:
Sasse A. Inferring RNA Sequence Specificities from Protein Sequences to Characterize Post-Transcriptional Regulation in Eukaryotes. PhD Thesis, University of Toronto (2021). Proquest link