Most BBB predictors give a binary label. This pipeline goes further: it tells you HOW MUCH free drug reaches the brain target (Kp,uu,brain), whether P-gp is actively pumping your compound out (NER classification), what happens when a co-drug inhibits that efflux (DDI simulation), and which exact structural features are causing BBB failure (SHAP). The result is not a prediction β it's a full CNS drug profile.
| Model | AUC-ROC | AUC-PR | F1 | MCC | Balanced Acc |
|---|---|---|---|---|---|
| Extra Trees π | 0.9540 Β± 0.0059 | 0.9725 | 0.9054 | 0.7441 | 0.8712 |
| Random Forest | 0.9524 Β± 0.0063 | 0.9714 | 0.9038 | 0.7354 | 0.8638 |
| XGBoost | 0.9496 Β± 0.0071 | 0.9680 | 0.9054 | 0.7454 | 0.8726 |
| LightGBM | 0.9490 Β± 0.0068 | 0.9671 | 0.9054 | 0.7462 | 0.8735 |
| Gradient Boosting | 0.9402 Β± 0.0074 | 0.9607 | 0.8982 | 0.7130 | 0.8466 |
| Logistic Regression | 0.8614 Β± 0.0098 | 0.8926 | 0.8468 | 0.5784 | 0.7862 |
Trained on 8,115 compounds (BBBP + B3DB merged). 10-fold stratified CV. Best model: Extra Trees (AUC-ROC 0.9540).
| Rank | Descriptor | r | Direction |
|---|---|---|---|
| 1 | CNS_MPO | +0.525 | β promotes BBB+ |
| 2 | TPSA | β0.517 | β inhibits BBB+ |
| 3 | NOCount | β0.504 | β inhibits BBB+ |
| 4 | BBB_Score | +0.495 | β promotes BBB+ |
| 5 | NumHeteroatoms | β0.492 | β inhibits BBB+ |
π View Full Pipeline Diagram β
A full end-to-end computational pipeline for predicting Blood-Brain Barrier (BBB) permeability of drug-like compounds, combining:
- 66 molecular descriptors (physicochemical, topological, electrostatic, binary flags)
- 6 ML classifiers with 10-fold stratified cross-validation
- SHAP explainability (XGBoost; TreeExplainer)
- Mechanistic PK decomposition β P-gp efflux, fup, fubrain, Kp,brain, Kp,uu,brain
- 2-compartment PBPK simulation with P-gp inhibition DDI scenarios
- Automated Excel reporting (9 worksheets, colour-coded)
Designed for CNS drug discovery: upload a CSV of SMILES and get a full tiered decision report.
The Blood-Brain Barrier (BBB) is a selective semipermeable membrane that restricts entry of most molecules into the brain. Predicting BBB permeability is a critical early-stage filter in CNS drug discovery.
This pipeline integrates three tiers of analysis:
| Tier | Analysis | Key Output |
|---|---|---|
| Level 1 | ML classification + EDA | BBB+/BBBβ probability |
| Level 2 | Mechanistic PK decomposition | Kp,uu,brain, P-gp NER, fup, fubrain |
| Level 3/4 | PBPK ODE simulation | Brain AUC, DDI risk ratio |
Four descriptor classes are computed from SMILES strings using RDKit:
| Class | Examples | Count |
|---|---|---|
| A β Physicochemical | MW, LogP, TPSA, HBD, HBA, QED, CNS MPO, BBB Score | ~14 |
| B β Topological/Structural | Chi indices, Kappa indices, BertzCT, ring counts | ~25 |
| C β Electrostatic | Partial charges, VSA contributions, LabuteASA | ~13 |
| D β Binary/Categorical | Lipinski flags, ionization class (one-hot), functional groups | ~12 |
- BBB Score (Gupta et al. 2019): 0β6 scale; β₯4 = BBB-penetrant
- CNS MPO (Wager et al. 2010, Pfizer): 0β6 scale; β₯4 = CNS-favourable
All models trained on the BBBP + B3DB merged benchmark (8,115 compounds after deduplication (1,975 BBBP + 6,140 B3DB)) with 10-fold stratified cross-validation:
| Model | Notes |
|---|---|
| Random Forest (n=300) | Class-balanced, sqrt features |
| XGBoost (n=300) | Scale-pos-weight for imbalance |
| LightGBM (n=300) | num_leaves=63, class-balanced |
| Extra Trees (n=300) | Class-balanced β π Best Model |
| Gradient Boosting (n=200) | Subsample=0.8 |
| Logistic Regression | Pipeline with StandardScaler |
- Variance threshold (removes near-zero variance features)
- Correlation filter (removes |r| > 0.90 pairs)
- Mutual information (top 39 features with MI > 0.005)
- VIF check (variance inflation factor β multicollinearity audit)
| Plot | Description |
|---|---|
plot_01_distributions.png |
KDE distributions of key descriptors (BBB+ vs BBBβ) |
plot_02_correlations.png |
Top-30 point-biserial correlations with BBB label |
plot_03_corr_heatmap.png |
Inter-descriptor correlation heatmap (top 15 features) |
plot_04_roc_pr.png |
ROC + Precision-Recall curves for all models (OOF) |
plot_05_model_comparison.png |
Grouped bar chart β 6 metrics Γ 6 models |
plot_06_confusion.png |
Confusion matrices for top-3 models |
plot_07_cv_boxplots.png |
10-fold CV score distributions (boxplots) |
plot_08_shap_*.png |
SHAP summary plots (beeswarm) per model |
plot_09_feature_importance.png |
Gini importance: RF vs XGBoost |
plot_10_pbpk_curves.png |
Brain vs plasma PK curves β normal vs P-gp inhibited |
plot_11_decision_dashboard.png |
Unified CNS decision dashboard (7 panels) |
plot_12_train_vs_pred.png |
KDE comparison: training set vs query compounds |
plot_13_radar.png |
Normalised radar chart β descriptor profiles |
| Sheet | Contents |
|---|---|
1_Predictions |
Full predictions with colour-coded BBB class, all PK metrics |
2_Model_Statistics |
CV stats + per-fold AUC table with Excel formulas |
3_Descriptor_Stats |
Mann-Whitney U + point-biserial per descriptor |
4_Feature_Selection |
MI scores + VIF table |
5_SHAP_Summary |
Mean |SHAP| values + direction per model |
6_Mechanistic_PK |
P-gp class, fup, fubrain, Kp,brain, Kp,uu,brain |
7_PBPK_Results |
Brain AUC normal vs inhibited, DDI ratio |
8_Training_Data |
BBBP + B3DB merged benchmark with descriptors + source column |
9_Thresholds_Reference |
Complete BBB decision rules from literature |
Click the badge above or open directly:
https://colab.research.google.com/github/Akshay-Krishnamurthy/structure-brain-link/blob/main/Blood_Brain_Barrier_Penetration_Prediction.ipynb
Run all cells top to bottom. Cell 1 installs all dependencies automatically (~3β4 min on first run).
git clone https://github.com/Akshay-Krishnamurthy/structure-brain-link.git
cd structure-brain-link
pip install -r requirements.txt
jupyter notebook Blood_Brain_Barrier_Penetration_Prediction.ipynbstructure-brain-link/
β
βββ Blood_Brain_Barrier_Penetration_Prediction.ipynb β Main pipeline notebook
βββ BBB_Pipeline_Diagram.html β Interactive pipeline diagram
βββ requirements.txt β All Python dependencies
βββ README.md β This file
βββ .gitignore β Git ignore rules
β
βββ sample_data/
β βββ demo_compounds.csv β 14 demo SMILES with known BBB status
β
βββ results/ β Example outputs (gitignored by default)
βββ SBL_Complete_Results.xlsx
βββ plots/
Upload a CSV with a column named SMILES (or smiles). Any additional columns (Name, ID, CAS, etc.) are preserved in output.
SMILES,Name,Reference
CN1CCC[C@H]1c2cccnc2,Nicotine,Known BBB+
CC(=O)Oc1ccccc1C(=O)O,Aspirin,Known BBB-
COc1ccc2[nH]cc(CC(N)C(=O)O)c2c1,5-Methoxytryptophan,TestBased on the J. Med. Chem. 2021 tiered framework:
BBB+ Probability (ML)
β
P-gp Efflux Class (Low/Medium/High) β NER value
β
fup (plasma unbound fraction)
fubrain (brain unbound fraction)
β
Kp,brain (brain-to-plasma partition, Rodgers & Rowland 2006)
β
Kp,uu,brain = (Kp,brain / NER) Γ (fup / fubrain)
| Value | Interpretation |
|---|---|
| > 1.0 | Net brain accumulation |
| 0.3β1.0 | Good CNS exposure |
| 0.1β0.3 | Efflux-limited (moderate) |
| < 0.1 | Poor CNS exposure |
Two-compartment ODE (plasma β brain) with IV bolus assumption:
dC_plasma/dt = -(CL_systemic + CL_passive)/Vp Γ Cp + (CL_passive + CL_efflux)/Vp Γ Cb
dC_brain/dt = (CL_passive/Vb) Γ Cp - (CL_passive + CL_efflux)/Vb Γ Cb
Scenarios simulated:
- Normal conditions
- P-gp inhibited (90% efflux inhibition β DDI scenario)
DDI Risk Classification:
- π΄ High: AUC ratio > 5Γ
- π‘ Moderate: 2β5Γ
- π’ Low: < 2Γ
| Descriptor | BBB+ Favoured | BBBβ Risk |
|---|---|---|
| TPSA | < 90 Γ Β² | > 120 Γ Β² |
| LogP | 1.0β5.0 | < 0 or > 5 |
| MW | < 450 Da | > 500 Da |
| HBD | β€ 3 | > 5 |
| HBA | β€ 8 | > 10 |
| BBB Score | β₯ 4 | < 3 |
| CNS MPO | β₯ 4 | < 3 |
| Kp,uu,brain | > 0.3 | < 0.1 |
-
Martins, I.F. et al. (2012). A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling. J. Chem. Inf. Model., 52(6), 1686β1697. β Primary source of the BBBP benchmark dataset used for training.
-
Meng, J. et al. (2021). B3DB: A Curated Blood-Brain Barrier Database. Scientific Data, 8, 289. β B3DB dataset merged with BBBP for training (~7,800+ compounds total).
-
Wu, Z. et al. (2018). MoleculeNet: A Benchmark for Molecular Machine Learning. Chemical Science, 9(2), 513β530. β MoleculeNet benchmark suite (BBBP is part of this collection).
-
DeepChem (https://deepchem.io) β Open-source platform hosting the BBBP dataset.
-
Gupta, M. et al. (2019). BBB Score β A Composite Score for Predicting Blood-Brain Barrier Permeation. J. Med. Chem., 62(19), 9134β9141.
-
Wager, T.T. et al. (2010). Moving beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties. ACS Chem. Neurosci., 1(6), 435β449.
-
Rodgers, T. & Rowland, M. (2006). Mechanistic approaches to volume of distribution predictions: understanding the processes. J. Pharm. Sci., 95(6), 1238β1257.
-
FridΓ©n, M. et al. (2010). Prediction of drug brain concentrations using an in vivo steady state brain slice model. Drug Metab. Dispos., 38(6), 1087β1093.
-
Lobell, M. & Sivarajah, V. (2003). In silico prediction of aqueous solubility, human plasma protein binding and the volume of distribution of compounds from calculated pKa and AlogP98 values. Mol. Divers., 7(1), 69β87. β fup (plasma unbound fraction) approximation method.
Akshay Krishnamurthy Hegde
- Field: Computational Drug Discovery / Machine Learning / Cheminformatics
- Tools: RDKit, scikit-learn, XGBoost, SHAP, PBPK modelling
MIT License β free to use and adapt with attribution.