Skip to content
This repository was archived by the owner on Mar 22, 2026. It is now read-only.

Latest commit

Β 

History

History
95 lines (63 loc) Β· 2.63 KB

File metadata and controls

95 lines (63 loc) Β· 2.63 KB

πŸ’Š Darwin PBPK Platform

DOI

"CiΓͺncia rigorosa. Resultados honestos. Impacto real."

AI-Powered PBPK Prediction Platform

State-of-the-art deep learning platform for physiologically-based pharmacokinetic (PBPK) parameter prediction using multi-modal molecular representations.

πŸš€ Features

  • βœ… Multi-Modal Embeddings: ChemBERTa 768d + Molecular graphs + RDKit descriptors
  • βœ… Advanced Architectures: GNN (GAT + TransformerConv), Multi-task learning
  • βœ… Large Dataset: 44,779 compounds (ChEMBL + TDC + KEC)
  • βœ… 3 PBPK Parameters: Fraction unbound (Fu), Volume of distribution (Vd), Clearance (CL)
  • βœ… PhysioQM Integration: Physics-informed constraints
  • βœ… Production Ready: Trained models, API endpoints

πŸ“Š Performance

  • Baseline MLP: RΒ² > 0.30 (ChemBERTa only)
  • GNN Model: RΒ² > 0.45 (Graphs + Attention)
  • Ensemble: RΒ² > 0.55 (Multi-modal fusion)

🧬 Architecture

Embeddings:

  • ChemBERTa: seyonec/ChemBERTa-zinc-base-v1 (768d)
  • Pre-trained on ~100M molecules (ZINC, PubChem)

Graphs:

  • PyTorch Geometric
  • 20 node features (atom type, charge, aromaticity, etc.)
  • 7 edge features (bond type, conjugation, ring, etc.)

Descriptors:

  • RDKit: 25 molecular descriptors
  • MW, LogP, TPSA, QED, HBA, HBD, etc.

Models:

  • GAT: 4 attention heads, 3 layers
  • TransformerConv: 4 heads, 3 layers
  • Multi-task: Weighted loss (Fu, Vd, CL)

πŸ“š Citation

If you use this software in your research, please cite:

Agourakis, D.C. (2025). Darwin PBPK Platform: AI-Powered Pharmacokinetic 
Prediction. Version 1.0.0 [Software]. Zenodo. 
https://doi.org/10.5281/zenodo.17536674

πŸ“– Dataset

Large datasets (embeddings, graphs, ~1.7 GB) available at:

πŸš€ Quick Start

# Install dependencies
pip install -r requirements.txt

# Train baseline MLP
python apps/training/01_baseline_mlp.py

# Train GNN model
python apps/training/02_gnn_model.py

# Make predictions (after training)
python apps/prediction/pbpk_predictor.py --smiles "CCO"

πŸ“Š Data Sources

  • ChEMBL: Bioactivity and PK data
  • TDC (Therapeutics Data Commons): ADMET benchmark datasets
  • KEC: Curated literature extractions

πŸ“„ License

MIT License - See LICENSE file

πŸ™ Acknowledgments

Developed for computational drug discovery with Q1 scientific rigor at PUCRS.


"Rigorous science. Honest results. Real impact."