The source code for building the Yeast Mitochondrial PTM Database at http://compbio.fmph.uniba.sk/y-mtptm/
This repository contains scripts necessary to build the y-mtPTM database located at http://compbio.fmph.uniba.sk/y-mtptm/ Assuch, it is not really meant to be run by others, but it may help in setting up similar projects. Once generated, the whole website consists only of HTML, CSS and JavaScript files and does not require server-side computation.
Prerequisites:
- The software was tested only on linux systems
- Python version at least 3.8 is required
- Data files are downloaded via
wgetbut alternatives such ascurlcan also be used - Other necessary libraries and prerequisites are installed by conda
Downloading repository and setting up Python libraries:
git clone https://github.com/fmfi-compbio/y-mtptm.git
# change to src dir
cd y-mtptm/src
# install further prerequisites by conda
conda env create -f ../environment.yml
conda activate ymtptm
# all subsequent commands should be run in src folder
# and with conda environment activatedDownloading necessary data files:
# S. cerevisiae proteome fasta from Uniprot
cd ../data/uniprot
wget https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000002311/UP000002311_559292.fasta.gz
gunzip *.gz
# proteome fasta and gene names for SGD
cd ../sgd
wget http://sgd-archive.yeastgenome.org/sequence/S288C_reference/orf_protein/archive/orf_trans_all_R64-3-1_20210421.fasta.gz
wget http://sgd-archive.yeastgenome.org/curation/literature/archive/gene_association.sgd.20210510.gaf.gz
gunzip *.gz
cd ../../srcBuilding the SQLite database from the Excel file:
- This will rewrite database provided in the repository but also download necessary pdb files
# remove the original database
rm ../data/excel/ymtptm.db
# create database schema
sqlite3 ../data/excel/ymtptm.db < create_db.sql
# fill database with data, download pdb files
# this takes some time
python3 excel_parser.py 2> excel.err > excel.log
# pdb files will be in ../data/pdb
# excel.err will contain some warningsBuilding website from the SQLite database
# this command also takes longer time
python3 html_builder.pyThis final step creates html files in ../web; these files can be then viewed in a browser locally or placed on a webserver.
data/excel/*.xlsxHuman readable Excel file with PTM datadata/excel/ymtptm.dbdata in SQLite formatdata/excel/modifications.csvconfiguration file with all considered PTM typesdata/pdbfolder for pdb files downloaded by excel_parser.pydata/sgdfolder for files from the SGD databasedata/uniprotfolder for files from the Uniprot databasewebfolder for the resulting websitesrc/config.pyfilenames of data filessrc/create_db.sqlSQLite database schemasrc/excel_parser.pyscript for converting database from Excel to SQLitesrc/html_builder.pyscript for building website from SQLite databasesrc/templatesHTML templates for jinja librarysrc/web_includeimages and CSS files used on the website directlysrc/pdb_to_htmlfiles needed to convert pdb files for GLmol via pymol
The following files contain code originating from the GLmol package:
src/pdb_to_html/pymol2glmol.pysrc/templates/protein_page.htmlsrc/web_include/glmol.jssrc/web_include/create_structure.js
The database content, source code and design of the website were created by Bronislava Brejová, Veronika Vozáriková, Ivan Agarský, Hana Derková, Matej Fedor, Dominika Harmanová, Lukáš Kiss, Andrej Korman, Martin Pašen, Filip Brázdovič, Jozef Nosek, Tomáš Vinař, Ľubomír Tomáška.
All of the authors were affiliated at the Faculty of Natural Sciences or the Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava.
Several files listed above contain modified code from the GLmol package by biochem_fan, which is distributed under the GNU Lesser General Public License. Therefore the whole repository y-mtptm is also distributed under this license.