This repo contains the code and enviromentments to generate results for Ulcerative Colitis
Run the environment installation shell script ./temporal_env.sh in linux/unix environment to install required packages for running the finetuning and temporal models. The finetuning model requires an advanced GPU NVIDIA A100 GPU (on colab) with at least 80G RAM. The temporal model was trained on NVIDIA V100-SXM2-16GB.
The supporting data is under data/ folder. Some very large files are not included such as foundational embeddings. The list of data files that need to be downloaded by the user are given below:
- Human PPI interactions from StringDB - https://stringdb-downloads.org/download/protein.physical.links.v12.0/9606.protein.physical.links.v12.0.txt.gz
- Protein to Gene symbol mapping - https://stringdb-downloads.org/download/protein.info.v12.0/9606.protein.info.v12.0.txt.gz
- Regulatory Protein-DNA interactions - https://cdn.netbiol.org/tflink/download_files/TFLink_Homo_sapiens_interactions_SS_simpleFormat_v1.0.tsv
- Foundation gene embeddings - https://zenodo.org/records/10833191 (GenePT_gene_embedding_ada_text.pickle)
- GO-Biological Processes human gene mapping from Ensembl ('HGNC symbol', 'GO term accession', 'GO domain', 'GO term name')
- Protein coding genes from Ensembl (HGNC Gene symbols)
- dataPrep_colon.ipynb : Prepares data to be used in dataprep_formodel_colon.ipynb for finetuning foundational embeddings
- dataprep_formodel_colon.ipynb : Prepares the graph data for finetuning
- finetuning_model_colon.ipynb : Finetuning model for colon samples
- TissueExpressionEnrichment_colon.ipynb : Tissue enrichment of embeddings (To be run in R Environment)
- temporal_DataPrep_UC.ipynb : Prepares graph data objects for the temporal model for responder patients in ulcerative colitis
- temporal_model_UC_Responders.ipynb : Temporal model for responder patients in ulcerative colitis
Utility File: src/utils/utils.py - Contains all classes and methods that are used by the notebooks