This repository is a code wrapper for CBB520 assignment-2 project 6-10 in protein pattern discovery.
-
Before running the code, please have the following python packages installed on your machine:
biopython, matplotlib, numpy, pandas, scipy, tqdm
. -
For project 8 & 9, we also need stride installation, please follow the instruction to install.
The data/ folder contains a list of input files for the projects: S288c_proteins
and gossypii_protein_sequences.csv
are the input and metadata
for project 6; S288c_proteins
, Ashbya_gossypii_proteome.faa.fasta
and ashbya_Sc_orthologs
are the input and metadata for project 7;
UP000002311_559292_YEAST_v4
folder is the Alphafold protein structures from S. cerevisiae.
The pdb files need to be unzipped with command: gzip -d *.gz
.
stride_output
folder is the output folder after stride processed. test
folder is the testing input data for project 10.
data/
βββ Ashbya_gossypii_proteome.faa.fasta
βββ S288c_proteins
βββ UP000002311_559292_YEAST_v4
βββ ashbya_Sc_orthologs
βββ gossypii_protein_sequences.csv
βββ stride_output
βββ test
The src/ folder wraps up the original source code and util functions for each group.
src/
βββ protein1D_pattern_group6.py
βββ protein1D_pattern_group7.py
βββ protein2D_pattern_group8.py
βββ protein2D_pattern_group9.py
βββ protein3D_pattern_group10.py
βββ srcGroup10
βΒ Β βββ util.py
βββ srcGroup6
βΒ Β βββ core.py
βΒ Β βββ find_ortholog.py
βΒ Β βββ read_protein.py
βΒ Β βββ replaceaa.py
βΒ Β βββ seqfind.py
βΒ Β βββ seqgen.py
βββ srcGroup7
βΒ Β βββ util.py
βββ srcGroup8
βΒ Β βββ util.py
βββ srcGroup9
βββ stride_preprocess.py
The sub-directory in results/ folder contains the results for each project.
results/
βββ group10
βββ group6
βββ group7
βββ group8
βββ group9
To run the code for all the project:
bash run.sh
which runs the source code for each project and also contains the stride processing.
Notice that you need to change the stride binary file directory to your own binary file address in your machine.
Here we use /Users/mac/Downloads/stride/stride
as an example in stride.sh
.