Generated: 2025-12-22 01:46
Correction (2026)
The "state-of-the-art, 0.9612 external AUC" claim below does not hold up under leakage-controlled evaluation and is retained here only for transparency. The 0.9612 was measured by training on BBBP and testing on B3DB, but B3DB overlaps BBBP by ~22%, so it is not true external validation, and the competitor numbers it is compared against were measured under different (scaffold) protocols, so the comparison is not like-for-like. Under a matched scaffold split the model scores ~0.83-0.84 AUC, is competitive with but does not beat a simple ECFP + random-forest baseline, is over-confident (ECE ~0.11), and is functionally stereo-blind (identical predictions for enantiomers). See the full reproducible audit: https://github.com/abinittio/bbb-honest-eval
StereoGNN-BBB V2 achieves state-of-the-art performance on external validation (B3DB, 7,807 compounds):
| Metric | Our V2 | Best Competitor | Improvement |
|---|---|---|---|
| External AUC | 0.9612 | 0.91 (ADMETlab 2.0) | +5.6% |
| Specificity | 65.25% | 72% (DeepBBB) | Comparable |
| Sensitivity | 97.96% | 93% (SwissADME) | +5% |
| Rank | Model | AUC | Year | Method |
|---|---|---|---|---|
| 1 🥇 | StereoGNN-BBB V2 (Ours) | 0.961 | 2025 | GATv2 + Stereo + Focal Loss + |
| 2 🥈 | ADMETlab 2.0 | 0.910 | 2021 | Multi-task DNN |
| 3 🥉 | AttentiveFP | 0.910 | 2020 | Graph Attention Network |
| 4 | admetSAR 2.0 | 0.900 | 2018 | Random Forest + fingerprints |
| 5 | ChemBERTa-77M | 0.900 | 2022 | Transformer (SMILES) |
| 6 | pkCSM | 0.890 | 2015 | Graph-based signatures + SVM |
| 7 | B3clf (XGBoost) | 0.890 | 2021 | XGBoost + RDKit descriptors |
| 8 | StereoGNN-BBB V1 (Ours) | 0.884 | 2025 | GATv2 + Stereo features |
| 9 | DeepBBB | 0.880 | 2021 | GCN + molecular descriptors |
| 10 | SwissADME (BOILED-Egg) | 0.840 | 2016 | WLOGP + TPSA rule-based |
Only StereoGNN-BBB enumerates stereoisomers at inference time, providing:
- Prediction ranges for molecules with unspecified stereocenters
- Critical for drug discovery where R/S enantiomers have different activities
Unlike competitors (binary classification only), we provide:
- Classification probability (BBB+/BBB-)
- Continuous LogBB value for quantitative ranking
- Threshold flexibility for different use cases
Focal Loss (α=0.75, γ=2.0) addresses 80/20 BBB+/BBB- imbalance:
- V1 Specificity: 42.1%
- V2 Specificity: 65.25% (+55%)
- Sensitivity maintained at 97.96%
Our metrics are on B3DB external dataset (7,807 unseen compounds). Most competitors report internal cross-validation (less rigorous).
- Quantum Features (Gaussian 3D conformers) - Expected +5% AUC
- 2M+ Molecule Pretraining - Expected +3% AUC
- GPU Training - Faster iteration
If using these benchmarks, please cite:
- StereoGNN-BBB: [Your paper]
- B3DB: Meng et al., Scientific Data 2021
- Competitor papers as listed above