DeepPScan-m6A

DeepPScan-m6A is a deep learning model based on convolutional neural networks. It can quantitatively predict the m6A preference score of protein sequences, which is used to discover potential m6A reader proteins or polypeptides. If the user has preference scores for other modifications or constructs, and uses custom new data to retrain the model, then the model can be expanded to predict various preference scores of proteins.

System Requirements

Hardware requirements

All the scripts in this package have been tested on a Dell 940xa server equipped with NVIDIA Tesla V100 GPUs. Depending on the volume of data, the machine requires sufficient memory and GPU memory.

Software requirements

OS Requirements

All the scripts and codes in this package have been tested on the CentOS 7.9 Linux system.

Python Dependencies

logging
argparse
random
numpy
pandas
scipy
sklearn
tensorflow
keras

Environment Deployment

mamba create --name DeepPScan python=3.7 tensorflow-gpu=1.14.0 keras-gpu=2.2.4
mamba activate DeepPScan
mamba install ipykernel=5.4.3 numpy=1.16.2 pandas=0.25.3 scipy=1.7.3 scikit-learn=1.0.2 matplotlib=3.1.1

Installation and Usage

# Install from Github
git clone https://github.com/WenjuSun/DeepPScan-m6A.git
cd DeepPScan-m6A
mamba activate DeepPScan

Run the prediction for the new sequence using the model in the command line.

# Use the saved model and weights to predict the new protein sequence
python 4_prediction_for_custom_sequences.py -m RBP_m6A_preference_CNN_model_DeepPScan -i 4_merged_data_for_predict_test.txt -o 4_merged_data_for_predict_test

# You can learn about the corresponding parameters and their functions by referring to the "help" information.
python 4_prediction_for_custom_sequences.py -h
# usage: 4_prediction_for_custom_sequences.py [-h] -m MODEL -i INPUT [-l LENGTH]
#                                             [-s STEP] -o OUTPUT [-v]
# optional arguments:
#   -h, --help            show this help message and exit
#   -m MODEL, --model MODEL
#                         The prefix name of model files.
#   -i INPUT, --input INPUT
#                         The sequence input file for prediction. Each line
#                         consists of two columns, with the format being
#                         <protein_id>\tab<protein_sequence>
#   -l LENGTH, --length LENGTH
#                         The length of the peptides used for predicted,
#                         default: 300. It needs to be consistent with the
#                         length used during model training.
#   -s STEP, --step STEP  The step size for sliding window on the protein
#                         sequence, default: 30.
#   -o OUTPUT, --output OUTPUT
#                         The prefix name of output files.
#   -v, --verbose         Enable verbose mode

Re-train the model using custom data in the command line

# Or use custom data to rebuild the model
python 3_retrain_DeepPScan_use_custom_data.py -i 0_your_data_for_retrain.txt -o retrain_your_CNN_model_DeepPScan

# You can learn about the corresponding parameters and their functions by referring to the "help" information.
python 3_retrain_DeepPScan_use_custom_data.py -h
# usage: 3_retrain_DeepPScan_use_custom_data.py [-h] -i INPUT [-l LENGTH]
#                                               [-s STEP] -o OUTPUT [-v]
# optional arguments:
#   -h, --help            show this help message and exit
#   -i INPUT, --input INPUT
#                         The sequences and scores input file is used for model
#                         training. Each line consists of three columns, with
#                         the format being <protein_id>\tab<protein_sequence>
#                         ab<preference_score>
#   -l LENGTH, --length LENGTH
#                         The length of the peptides used for training, default:
#                         300.
#   -s STEP, --step STEP  The step size for sliding window on the protein
#                         sequence, default: 30.
#   -o OUTPUT, --output OUTPUT
#                         The prefix name of output model files.
#   -v, --verbose         Enable verbose mode

Examples and Tutorial Notebook

You can open the notebook: 0_train_CNN_model_for_RBP_m6A_log2OR_predicting.ipynb

Refer to the practice code in it and test and run it step by step within the notebook to understand the detailed model training process.
Or open the notebook: 1_predict_RBP_m6A_log2OR_for_AAseq.ipynb

Refer to the practice code in it, and test and run it step by step within the notebook to understand the detailed process of using the model to predict new sequences.

Citing `DeepPScan-m6A`

#TODO

doi: xxxx

License

This project is covered under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepPScan-m6A

System Requirements

Hardware requirements

Software requirements

OS Requirements

Python Dependencies

Environment Deployment

Installation and Usage

Run the prediction for the new sequence using the model in the command line.

Re-train the model using custom data in the command line

Examples and Tutorial Notebook

Citing `DeepPScan-m6A`

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
10_species_all_proteins_predict_top_results		10_species_all_proteins_predict_top_results
.gitignore		.gitignore
0_all_m6A_minus_protAA_with_log2OR.xls		0_all_m6A_minus_protAA_with_log2OR.xls
0_all_m6A_plus_protAA_with_log2OR.xls		0_all_m6A_plus_protAA_with_log2OR.xls
0_all_m6A_unknown_protAA_with_log2OR.xls		0_all_m6A_unknown_protAA_with_log2OR.xls
0_train_CNN_model_for_RBP_m6A_log2OR_predicting.ipynb		0_train_CNN_model_for_RBP_m6A_log2OR_predicting.ipynb
1_predict_RBP_m6A_log2OR_for_AAseq.ipynb		1_predict_RBP_m6A_log2OR_for_AAseq.ipynb
2_all_proAAseq_for_predict.peptide_wtih_log2OR.xls		2_all_proAAseq_for_predict.peptide_wtih_log2OR.xls
2_all_proAAseq_for_predict.protein_max_log2OR.xls		2_all_proAAseq_for_predict.protein_max_log2OR.xls
2_all_proAAseq_for_predict.txt		2_all_proAAseq_for_predict.txt
3_retrain_DeepPScan_use_custom_data.py		3_retrain_DeepPScan_use_custom_data.py
4_prediction_for_custom_sequences.py		4_prediction_for_custom_sequences.py
LICENSE		LICENSE
RBP_m6A_preference_CNN_model_DeepPScan.h5		RBP_m6A_preference_CNN_model_DeepPScan.h5
RBP_m6A_preference_CNN_model_DeepPScan.json		RBP_m6A_preference_CNN_model_DeepPScan.json
README.md		README.md
all_351_protein_and_log2OR_for_rank_scatter_plot.txt		all_351_protein_and_log2OR_for_rank_scatter_plot.txt
all_351_protein_and_log2OR_rank_scatter_plot.ipynb		all_351_protein_and_log2OR_rank_scatter_plot.ipynb
all_351_protein_and_log2OR_rank_scatter_plot.pdf		all_351_protein_and_log2OR_rank_scatter_plot.pdf
all_351_protein_and_rep12_log2OR_correlation_scatter_plot.color_use_kde.pdf		all_351_protein_and_rep12_log2OR_correlation_scatter_plot.color_use_kde.pdf
all_351_protein_and_rep12_log2OR_for_correlation_scatter_plot.txt		all_351_protein_and_rep12_log2OR_for_correlation_scatter_plot.txt
all_351_protein_real_and_predicted_m6A_log2OR_scatter_plot.colour_use_kde.pcc.pdf		all_351_protein_real_and_predicted_m6A_log2OR_scatter_plot.colour_use_kde.pcc.pdf
all_351_protein_real_and_predicted_m6A_log2OR_scatter_plot.colour_use_kde.pdf		all_351_protein_real_and_predicted_m6A_log2OR_scatter_plot.colour_use_kde.pdf
random_AAstr_L300.1b.with_predict_log2OR.sorted.top100.xlsx		random_AAstr_L300.1b.with_predict_log2OR.sorted.top100.xlsx

Folders and files

Latest commit

History

Repository files navigation

DeepPScan-m6A

System Requirements

Hardware requirements

Software requirements

OS Requirements

Python Dependencies

Environment Deployment

Installation and Usage

Run the prediction for the new sequence using the model in the command line.

Re-train the model using custom data in the command line

Examples and Tutorial Notebook

Citing DeepPScan-m6A

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Citing `DeepPScan-m6A`

Packages