Skip to content
/ pbmf Public
forked from gaarangoa/pbmf

Predictive Biomarker Modeling Framework (PBMF)

License

Notifications You must be signed in to change notification settings

gohweixun/pbmf

 
 

Repository files navigation

Predictive Biomarker Modeling Framework (PBMF)

The PBMF (Publised in Cancer cell ) is an automated neural network framework based on contrastive learning. This general-purpose framework explores potential predictive biomarkers in a systematic and unbiased manner.

alt text Under the hood, the PBMF searches for a biomarker that maximizes the benefit under treatment of interest while at the same time minimizes the effect of the control treatment.

Quick tour

The PBMF runs as follows:

from PBMF.attention.model_zoo.SimpleModel import Net
from PBMF.attention.model_zoo.Ensemble import EnsemblePBMF

# Setup ensemble
pbmf = EnsemblePBMF(
    time=time, 
    event=event,
    treatment=treatment,
    stratify=treatment,
    features = features,
    discard_n_features=1, # discard n features on each PBMF model
    architecture=Net, # Architecrture to use, we are using a simple NN.
    **params
)

# Train ensemble model
pbmf.fit(
    data_train, # Dataframe with the processed data
    num_models=10, # number of PBMF models used in the ensemble
    n_jobs=4,
    test_size=0.2, # Discard this fraction (randomly) of patients when fiting a PBMF model
    outdir='./runs/experiment_0/',
    save_freq=100,
)

Once the model is trained, get the predictive biomarker scores and labels is as simple as:

# Load the ensemble PBMF
pbmf = EnsemblePBMF()
pbmf.load(
    architecture=Net,
    outdir='./runs/experiment_0/',
    num_models=10,
)

# Retrieve scores for predictive biomarker positive / negative
data_test['predictive_biomarker_risk'] = pbmf.predict(data_test, epoch=500)
# Generate biomarker positive and negative labels
data_test['predicted_label'] = (data_test['predictive_biomarker_risk'] > 0.5).replace([False, True], ['B-', 'B+'])

PBMF demos

  • Under ./demos/ you will find a complete guide on how to use the framework.
  • under ./demos/app you can find the app for visualizing the distilation trees and interpretability.
  • under ./demos/simulation we have an example on how to build synthetic survival datasets.

System Requirements

Hardware requirements

The PBMF can be run in standard computers with enough RAM memory. PBMF is efficient when running on multiple cores to perform parallel trainings when setting a large number of models (num_models).

The PBMF runs in Python > 3 and has been tested on MacOS and Linux Ubuntu distributions.

Software requirements

This python package is supported for macOS and Linux. The PBMF has been tested on the following systems using docker and singularity containers:

OS requirements

  • macOS: Sonoma
  • Linux: Ubuntu 18.04 LTS
  • Windows: WSL2 / ubuntu / x86_64

Python dependencies

PBMF was extensively tested using the following libraries:

tensorflow==2.6.0
scipy==1.5.4
numpy==1.19.5
scikit-learn==0.24.1
pandas==1.1.5
seaborn==0.11.1

The PBMF has been also tested with latest updates of the listed libraries.

Installation guide

Basic installation

pip install tensorflow==2.6.0
pip install scipy==1.5.4
pip install numpy==1.19.5
pip install scikit-learn==0.24.1
pip install pandas==1.1.5
pip install seaborn==0.11.1 
pip install --no-cache-dir git+https://github.com/gaarangoa/samecode.git
pip install --no-cache-dir git+https://github.com/gaarangoa/pbmf.git

Docker container

The easiest way to get started with the PBMF is to run it through a docker container. We have created an image with all necessary libraries and these containers should seamlessly work.

For macOS ARM processors:

    # Download the PBMF repository
    git clone https://github.com/gaarangoa/pbmf.git
    cd ./pbmf/

    # Build the docker image
    docker pull gaarangoa/ml:v2.1.0.1_ARM
    docker build -f Dockerfile.arm . --tag pbmf

    # Launch a jupyter notebook
    docker run -it --rm -p 8888:8888 pbmf jupyter notebook --NotebookApp.default_url=/lab/ --ip=0.0.0.0 --port=8888 --allow-root
For x86-64 processors:
    # Download the PBMF repository
    git clone https://github.com/gaarangoa/pbmf.git
    cd ./pbmf/

    # Build the docker image
    docker pull gaarangoa/dsai:version-2.0.3_tf2.6.0_pt1.9.0
    docker build -f Dockerfile.x86-64 . --tag pbmf

    # Launch a jupyter notebook
    docker run -it --rm -p 8888:8888 pbmf jupyter notebook --NotebookApp.default_url=/lab/ --ip=0.0.0.0 --port=8888 --allow-root

Dependencies for manuscript experiments

All experiments in the manuscript were performend in our internal HCP. We used multiple nodes with 100 cores for running the PBMF in parallel. No GPU acceleration was enabled. The HCP used Ubuntu 18.04. For each run we deployed docker containers using singularity version=3.7.1 the image used is available at docker hub (gaarangoa/dsai:version-2.0.3_tf2.6.0_pt1.9.0).

License

The code is freely available under the MIT License

Citation

If you use this work in any form, please cite as follows:

@article{arango2025ai,
  title={AI-driven predictive biomarker discovery with contrastive learning to improve clinical trial outcomes},
  author={Arango-Argoty, Gustavo and Bikiel, Damian E and Sun, Gerald J and Kipkogei, Elly and Smith, Kaitlin M and Pro, Sebastian Carrasco and Choe, Elizabeth Y and Jacob, Etai},
  journal={Cancer Cell},
  year={2025},
  publisher={Elsevier}
}

About

Predictive Biomarker Modeling Framework (PBMF)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 64.9%
  • Python 35.1%