This repository contains the code to reproduce the results of the paper M. Bohl, M. Esteban-Medina, and K. Lenhof, Can deep learning models for drug sensitivity prediction truly transfer knowledge from bulk to single-cell data?, bioRxiv (2025).
To reproduce the results, you need to have conda installed. You can create the conda environment with the following command:
conda env create -f environment.yamlThis will create a conda environment named benchmark with all the necessary packages.
Additionally, you need a Weights & Biases (wandb) account. The secltion below explains a minimal configuration needed to run the code/hyper_tuning.py and code/independent_evaluation.py script.
Furthermore, you need to download and unzip the datasets used in the benchmark paper from Zenodo into datasets/processed/.
- The hyperparameter tuning script logs the results to Weights & Biases (wandb).
- Create or use an existing wandb account at https://wandb.ai and copy your API key (Profile → Settings → API Keys).
- Activate the
benchmarkenvironment and authenticate by runningwandb login <your_api_key>(or setWANDB_API_KEY=<your_api_key>in the shell before starting the script). - After authentication,
code/hyper_tuning.pywill create a run group per drug/target/model and store the local cache undercode/wandb/automatically, no further setup is required.
To reproduce the hyperparameter tuning, you can run the hyper_tuning.py script.
The script takes the following arguments:
--drugs: A list of drugs to process.--n_trials: The number of Optuna trials.--model: The name of the model to tune.
For example, to run the hyperparameter tuning for the SCAD model on the drug Afatinib with 10 hyperparameter combinations (Optuna trials), you can run the following command:
bash -c "conda activate benchmark && python code/hyper_tuning.py --drugs Afatinib --n_trials 10 --model SCAD"code/: Contains the source code for the models and experiments.datasets/: Contains the datasets used in the paper.
The code/ folder gathers data processing helpers, experiment orchestration scripts, and the individual implementation of each transfer learning framework.
code/data_utils.py: Shared data processing helpers for the harmonization of datasets, gene gene vocabularies (including scATD-specific alignment), etc. Also builds PyTorch dataloaders such asCombinedDataLoaderfor paired source/target batches andcreate_shot_dataloadersfor semi-supervised setups.code/training_utils.py: Shared training helpers and baselines. It sets global seeds, defines callbacks (e.g., delayed early stopping), computes model metrics, and wraps framework-specific runners (run_scad_benchmark,run_scdeal_benchmark, etc.) alongside classical baselines such as CatBoost and RandomForest.code/hyper_tuning.py: Runs Optuna sweeps per drug/domain and logs trials to Weights & Biases. It standardizes preprocessing, constructs framework argument objects, and dispatches to the runners above.code/independent_evaluation.py: Repeats the preprocessing pipeline for held-out target datasets and launches framework benchmarks/few-shot baselines with consistent defaults, enabling cross-dataset comparisons.code/frameworks/: Houses the Lightning implementations of each domain adaptation method:SCAD/: Domain-adversarial Lightning module that couples a shared encoder, response predictor, and gradient-reversal discriminator with a tunable weight lambda.scATD/: Wraps a pre-trained Dist-VAE encoder and classifier head.setuploads checkpoints, aligning gene vocabularies by padding/truncation; fine-tuning alternates between frozen-classifier warm-up and optional encoder unfreezing, optimizing cross-entropy plus an RBF MMD penalty via manual optimization.scDeal/: Implements the three-stage scDEAL workflow. Autoencoder/predictor pretraining is followed by a DaNN domain adaptation step with BCE, MMD, and Louvain-cluster similarity regularizers, orchestrated through manual optimization. Utilities inscDEAL_utils.pyconstruct target KNN graphs and Louvain assignments.SSDA4Drug/: Lightning module that implements, SSDA4Drug, with a shared encoder and classifier to which adversarial perturbations can be applied optionally. Training mixes supervised cross-entropy (source + few-shot target) with alternating entropy minimization and maximization on unlabeled target batches viautils.adentropy.
- To reproduce the results, you will need to download and unzip the processde datasesets used in the paper into
datasets/processed/. The datasets can be downloaded from Zenodo. - To run scATD, the pre-trained model weights (file checkpoint_fold1_epoch_30.pth) need to be downloaded from figshare (https://figshare.com/articles/software/scATD/27908847) and placed in
code/frameworks/scATD/pretrained_models/. - If the original URL of the pretrained scATD model doesn't work, we provide a copy of the model weights in the repository on Zenodo.