Paper: https://www.frontiersin.org/journals/systems-biology/articles/10.3389/fsysb.2025.1651930/full
Preprint: https://arxiv.org/abs/2501.01644
- Accepted for publication at Frontiers in Systems Biology
- Presented at BIOKDD 2025 in Toronto, Canada
Biomedical Knowledge Graphs (BKGs) integrate diverse datasets to elucidate complex relationships within the biomedical field. Effective link prediction on these graphs can uncover valuable connections, such as potential novel drug-disease relations. We introduce a novel multimodal approach that unifies embeddings from specialized Language Models (LMs) with Graph Contrastive Learning (GCL) to enhance intra-entity relationships while employing a Knowledge Graph Embedding (KGE) model to capture inter-entity relationships for effective link prediction. To address limitations in existing BKGs, we present PrimeKG++, an enriched knowledge graph incorporating multimodal data, including biological sequences and textual descriptions for each entity type. By combining semantic and relational information in a unified representation, our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes. Experimental results on PrimeKG++ and the DrugBank drug-target interaction dataset demonstrate the effectiveness and robustness of our method across diverse biomedical datasets.

Overview of our proposed framework.
A. Modality Embedding: Creating node attribute embeddings through domain-specific LMs.
B. Contrastive Learning: Enhancement of LM-derived embeddings for specific node attributes of the same type through Fusion Module and Contrastive Learning.
C. Link Prediction on KG Embedding: Utilizing the enhanced embeddings to perform link prediction tasks through a Knowledge Graph Embedding (KGE) model that learns relationships and enhances semantic information across distinct node types.
Install huggingface-cli:
pip install -U "huggingface_hub[cli]"Download PrimeKG++ and DrugBank DTI datasets from the Hugging Face Hub, which include multimodal features and processed triplets:
huggingface-cli download tienda02/BioMedKG --repo-type=dataset --local-dir ./dataCreate a Conda environment and install dependencies:
conda create --name biokg python=3.10
conda activate biokg
make-
Review configuration files in the configs directory and training scripts in the scripts directory.
-
Set your Comet API key to track experiments:
export COMET_API_KEY=<your-comet-api-key> -
Modify the following variables for different as needed:
NODE_TYPEMODEL_NAMENODE_INIT_METHODGCL_MODELGCL_FUSE_METHODPRETRAINED_PATH
Train the model with graph contrastive learning to enhance intra-node type relationships:
bash scripts/gcl.shTrain the KGE model for link prediction tasks:
bash scripts/kge.shFine-tune the KGE model on the Drug-Protein Interaction (DPI) benchmark:
bash scripts/dpi.shEvaluate link prediction on the PrimeKG++ dataset:
bash scripts/test_kge.shEvaluate link prediction on the DrugBank DTI dataset:
bash scripts/test_dpi.shTo reproduce results, download the pre-trained checkpoints from the Hugging Face Hub:
huggingface-cli download tienda02/BioMedKG --repo-type=model --local-dir ./ckptThen, set the PRETRAINED_PATH variable in the test scripts to the downloaded checkpoint.
-
This project builds upon the PrimeKG dataset introduced in the paper:
Building a knowledge graph to enable precision medicine Chandak, Payal and Huang, Kexin and Zitnik, Marinka. Published in Nature Scientific Data, 2023.
-
This project leverages the DrugBank drug-target interaction dataset:
Drugbank 6.0: the drugbank knowledgebasefor 2024. Craig Knox, Mike Wilson, Christen M Klinger, et al. Published in Nucleic Acids Research, 2023.
@ARTICLE{10.3389/fsysb.2025.1651930,
AUTHOR={Dang, Tien and Nguyen, Viet Thanh Duy and Le, Minh Tuan and Hy, Truong-Son },
TITLE={BioMedKG: multimodal contrastive representation learning in augmented BioMedical knowledge graphs},
JOURNAL={Frontiers in Systems Biology},
VOLUME={Volume 5 - 2025},
YEAR={2025},
URL={https://www.frontiersin.org/journals/systems-biology/articles/10.3389/fsysb.2025.1651930},
DOI={10.3389/fsysb.2025.1651930},
ISSN={2674-0702},
ABSTRACT={Biomedical Knowledge Graphs (BKGs) integrate diverse datasets to elucidate complex relationships within the biomedical field. Effective link prediction on these graphs can uncover valuable connections, such as potential new drug-disease relations. We introduce a novel multimodal approach that unifies embeddings from specialized Language Models (LMs) with Graph Contrastive Learning (GCL) to enhance intra-entity relationships while employing a Knowledge Graph Embedding (KGE) model to capture inter-entity relationships for effective link prediction. To address limitations in existing BKGs, we present PrimeKG++, an enriched knowledge graph incorporating multimodal data, including biological sequences and textual descriptions for each entity type. By combining semantic and relational information in a unified representation, our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes. Experimental results in PrimeKG++ and the DrugBank drug-target interaction dataset demonstrate the effectiveness and robustness of our method in diverse biomedical datasets. Our source code, pre-trained models, and data are publicly available at https://github.com/HySonLab/BioMedKG.}}@misc{dang2025multimodalcontrastiverepresentationlearning,
title={Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs},
author={Tien Dang and Viet Thanh Duy Nguyen and Minh Tuan Le and Truong-Son Hy},
year={2025},
eprint={2501.01644},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.01644},
}