The PBMF (Publised in Cancer cell ) is an automated neural network framework based on contrastive learning. This general-purpose framework explores potential predictive biomarkers in a systematic and unbiased manner.
Under the hood, the PBMF searches for a biomarker that maximizes the benefit under treatment of interest while at the same time minimizes the effect of the control treatment.
The PBMF runs as follows:
from PBMF.attention.model_zoo.SimpleModel import Net
from PBMF.attention.model_zoo.Ensemble import EnsemblePBMF
# Setup ensemble
pbmf = EnsemblePBMF(
time=time,
event=event,
treatment=treatment,
stratify=treatment,
features = features,
discard_n_features=1, # discard n features on each PBMF model
architecture=Net, # Architecrture to use, we are using a simple NN.
**params
)
# Train ensemble model
pbmf.fit(
data_train, # Dataframe with the processed data
num_models=10, # number of PBMF models used in the ensemble
n_jobs=4,
test_size=0.2, # Discard this fraction (randomly) of patients when fiting a PBMF model
outdir='./runs/experiment_0/',
save_freq=100,
)
Once the model is trained, get the predictive biomarker scores and labels is as simple as:
# Load the ensemble PBMF
pbmf = EnsemblePBMF()
pbmf.load(
architecture=Net,
outdir='./runs/experiment_0/',
num_models=10,
)
# Retrieve scores for predictive biomarker positive / negative
data_test['predictive_biomarker_risk'] = pbmf.predict(data_test, epoch=500)
# Generate biomarker positive and negative labels
data_test['predicted_label'] = (data_test['predictive_biomarker_risk'] > 0.5).replace([False, True], ['B-', 'B+'])
- Under
./demos/
you will find a complete guide on how to use the framework. - under
./demos/app
you can find the app for visualizing the distilation trees and interpretability. - under
./demos/simulation
we have an example on how to build synthetic survival datasets.
The PBMF
can be run in standard computers with enough RAM memory. PBMF is efficient when running on multiple cores to perform parallel trainings when setting a large number of models (num_models
).
The PBMF runs in Python > 3
and has been tested on MacOS and Linux Ubuntu distributions.
This python package is supported for macOS and Linux. The PBMF has been tested on the following systems using docker and singularity containers:
- macOS: Sonoma
- Linux: Ubuntu 18.04 LTS
- Windows: WSL2 / ubuntu / x86_64
PBMF was extensively tested using the following libraries:
tensorflow==2.6.0
scipy==1.5.4
numpy==1.19.5
scikit-learn==0.24.1
pandas==1.1.5
seaborn==0.11.1
The PBMF has been also tested with latest updates of the listed libraries.
pip install tensorflow==2.6.0
pip install scipy==1.5.4
pip install numpy==1.19.5
pip install scikit-learn==0.24.1
pip install pandas==1.1.5
pip install seaborn==0.11.1
pip install --no-cache-dir git+https://github.com/gaarangoa/samecode.git
pip install --no-cache-dir git+https://github.com/gaarangoa/pbmf.git
The easiest way to get started with the PBMF is to run it through a docker container. We have created an image with all necessary libraries and these containers should seamlessly work.
# Download the PBMF repository
git clone https://github.com/gaarangoa/pbmf.git
cd ./pbmf/
# Build the docker image
docker pull gaarangoa/ml:v2.1.0.1_ARM
docker build -f Dockerfile.arm . --tag pbmf
# Launch a jupyter notebook
docker run -it --rm -p 8888:8888 pbmf jupyter notebook --NotebookApp.default_url=/lab/ --ip=0.0.0.0 --port=8888 --allow-root
# Download the PBMF repository
git clone https://github.com/gaarangoa/pbmf.git
cd ./pbmf/
# Build the docker image
docker pull gaarangoa/dsai:version-2.0.3_tf2.6.0_pt1.9.0
docker build -f Dockerfile.x86-64 . --tag pbmf
# Launch a jupyter notebook
docker run -it --rm -p 8888:8888 pbmf jupyter notebook --NotebookApp.default_url=/lab/ --ip=0.0.0.0 --port=8888 --allow-root
All experiments in the manuscript were performend in our internal HCP. We used multiple nodes with 100 cores for running the PBMF in parallel. No GPU acceleration was enabled. The HCP used Ubuntu 18.04
. For each run we deployed docker containers using singularity version=3.7.1
the image used is available at docker hub (gaarangoa/dsai:version-2.0.3_tf2.6.0_pt1.9.0
).
The code is freely available under the MIT License
If you use this work in any form, please cite as follows:
@article{arango2025ai,
title={AI-driven predictive biomarker discovery with contrastive learning to improve clinical trial outcomes},
author={Arango-Argoty, Gustavo and Bikiel, Damian E and Sun, Gerald J and Kipkogei, Elly and Smith, Kaitlin M and Pro, Sebastian Carrasco and Choe, Elizabeth Y and Jacob, Etai},
journal={Cancer Cell},
year={2025},
publisher={Elsevier}
}