Skip to content

woweizhi/CBB-520-assignment3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CBB520 Assignment 3 - Group 1

This repository is a code wrapper for CBB520 assignment-2 project 6-10 in protein pattern discovery.

Pre-requisite: python packages to run the code

  • Before running the code, please have the following python packages installed on your machine: biopython, matplotlib, numpy, pandas, scipy, tqdm.

  • For project 8 & 9, we also need stride installation, please follow the instruction to install.

Code & Folder structure

The data/ folder contains a list of input files for the projects: S288c_proteins and gossypii_protein_sequences.csv are the input and metadata for project 6; S288c_proteins, Ashbya_gossypii_proteome.faa.fasta and ashbya_Sc_orthologs are the input and metadata for project 7; UP000002311_559292_YEAST_v4 folder is the Alphafold protein structures from S. cerevisiae. The pdb files need to be unzipped with command: gzip -d *.gz. stride_output folder is the output folder after stride processed. test folder is the testing input data for project 10.

data/
β”œβ”€β”€ Ashbya_gossypii_proteome.faa.fasta
β”œβ”€β”€ S288c_proteins
β”œβ”€β”€ UP000002311_559292_YEAST_v4
β”œβ”€β”€ ashbya_Sc_orthologs
β”œβ”€β”€ gossypii_protein_sequences.csv
β”œβ”€β”€ stride_output
└── test

The src/ folder wraps up the original source code and util functions for each group.

src/
β”œβ”€β”€ protein1D_pattern_group6.py
β”œβ”€β”€ protein1D_pattern_group7.py
β”œβ”€β”€ protein2D_pattern_group8.py
β”œβ”€β”€ protein2D_pattern_group9.py
β”œβ”€β”€ protein3D_pattern_group10.py
β”œβ”€β”€ srcGroup10
β”‚Β Β  └── util.py
β”œβ”€β”€ srcGroup6
β”‚Β Β  β”œβ”€β”€ core.py
β”‚Β Β  β”œβ”€β”€ find_ortholog.py
β”‚Β Β  β”œβ”€β”€ read_protein.py
β”‚Β Β  β”œβ”€β”€ replaceaa.py
β”‚Β Β  β”œβ”€β”€ seqfind.py
β”‚Β Β  └── seqgen.py
β”œβ”€β”€ srcGroup7
β”‚Β Β  └── util.py
β”œβ”€β”€ srcGroup8
β”‚Β Β  └── util.py
└── srcGroup9
    └── stride_preprocess.py

The sub-directory in results/ folder contains the results for each project.

results/
β”œβ”€β”€ group10
β”œβ”€β”€ group6
β”œβ”€β”€ group7
β”œβ”€β”€ group8
└── group9

Running

To run the code for all the project:

bash run.sh

which runs the source code for each project and also contains the stride processing. Notice that you need to change the stride binary file directory to your own binary file address in your machine. Here we use /Users/mac/Downloads/stride/stride as an example in stride.sh.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published