BBB Predictor Benchmark Report

Generated: 2025-12-22 01:46

Correction (2026)

The "state-of-the-art, 0.9612 external AUC" claim below does not hold up under leakage-controlled evaluation and is retained here only for transparency. The 0.9612 was measured by training on BBBP and testing on B3DB, but B3DB overlaps BBBP by ~22%, so it is not true external validation, and the competitor numbers it is compared against were measured under different (scaffold) protocols, so the comparison is not like-for-like. Under a matched scaffold split the model scores ~0.83-0.84 AUC, is competitive with but does not beat a simple ECFP + random-forest baseline, is over-confident (ECE ~0.11), and is functionally stereo-blind (identical predictions for enantiomers). See the full reproducible audit: https://github.com/abinittio/bbb-honest-eval

Executive Summary

StereoGNN-BBB V2 achieves state-of-the-art performance on external validation (B3DB, 7,807 compounds):

Metric	Our V2	Best Competitor	Improvement
External AUC	0.9612	0.91 (ADMETlab 2.0)	+5.6%
Specificity	65.25%	72% (DeepBBB)	Comparable
Sensitivity	97.96%	93% (SwissADME)	+5%

Head-to-Head Comparison

Rank	Model	AUC	Year	Method
1 🥇	StereoGNN-BBB V2 (Ours)	0.961	2025	GATv2 + Stereo + Focal Loss +
2 🥈	ADMETlab 2.0	0.910	2021	Multi-task DNN
3 🥉	AttentiveFP	0.910	2020	Graph Attention Network
4	admetSAR 2.0	0.900	2018	Random Forest + fingerprints
5	ChemBERTa-77M	0.900	2022	Transformer (SMILES)
6	pkCSM	0.890	2015	Graph-based signatures + SVM
7	B3clf (XGBoost)	0.890	2021	XGBoost + RDKit descriptors
8	StereoGNN-BBB V1 (Ours)	0.884	2025	GATv2 + Stereo features
9	DeepBBB	0.880	2021	GCN + molecular descriptors
10	SwissADME (BOILED-Egg)	0.840	2016	WLOGP + TPSA rule-based

Key Differentiators

1. Stereo-Awareness

Only StereoGNN-BBB enumerates stereoisomers at inference time, providing:

Prediction ranges for molecules with unspecified stereocenters
Critical for drug discovery where R/S enantiomers have different activities

2. Multi-Task Learning

Unlike competitors (binary classification only), we provide:

Classification probability (BBB+/BBB-)
Continuous LogBB value for quantitative ranking
Threshold flexibility for different use cases

3. Class Imbalance Handling

Focal Loss (α=0.75, γ=2.0) addresses 80/20 BBB+/BBB- imbalance:

V1 Specificity: 42.1%
V2 Specificity: 65.25% (+55%)
Sensitivity maintained at 97.96%

4. External Validation

Our metrics are on B3DB external dataset (7,807 unseen compounds). Most competitors report internal cross-validation (less rigorous).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBB Predictor Benchmark Report

Executive Summary

Head-to-Head Comparison

Key Differentiators

1. Stereo-Awareness

2. Multi-Task Learning

3. Class Imbalance Handling

4. External Validation

Planned Improvements

Citation

FilesExpand file tree

BENCHMARK_REPORT.md

Latest commit

History

BENCHMARK_REPORT.md

File metadata and controls

BBB Predictor Benchmark Report

Executive Summary

Head-to-Head Comparison

Key Differentiators

1. Stereo-Awareness

2. Multi-Task Learning

3. Class Imbalance Handling

4. External Validation

Planned Improvements

Citation