Skip to content

shreyash-goli/smiles-nn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMILES-GCN: Dual-Stream Molecular Property Predictor

A dual-stream neural network combining 1D-CNN (for SMILES sequences) and Graph Convolutional Networks (for molecular graphs) to predict multiple molecular properties simultaneously.

Datasets

This project uses 5 MoleculeNet datasets:

  1. BBBP - Blood-Brain Barrier Permeability (Classification)
  2. Lipophilicity - Lipophilicity prediction (Regression)
  3. SAMPL (FreeSolv) - Solvation free energy (Regression)
  4. ESOL (Delaney) - Aqueous solubility (Regression)
  5. Tox21 - 12 toxicity endpoints (Multi-task Classification)

Project Structure

sMLles/
├── smiles_gcn/           # Main package
│   ├── models/           # Model architectures (CNN, GCN, fusion, prediction heads)
│   ├── data/             # Data loading and preprocessing
│   ├── training/         # Training pipeline (trainer, losses, metrics)
│   └── utils/            # Utility functions (config, visualization, chemistry)
├── data/                 # MoleculeNet datasets (BBBP, Lipophilicity, SAMPL, ESOL, Tox21)
├── configs/              # Configuration files
├── scripts/              # Training scripts
│   ├── train.py          # Main training script
│   └── test_data_loading.py  # Data loading validation
├── sMLles_training/      # Training outputs
│   ├── checkpoints/      # Saved model checkpoints
│   └── results/          # Training curves and metrics
├── train_on_colab.ipynb  # Colab notebook for training
└── requirements.txt      # Python dependencies

Installation

# Create conda environment
conda create -n smiles-gcn python=3.9
conda activate smiles-gcn

# Install PyTorch for Apple Silicon
pip install torch torchvision

# Install PyTorch Geometric
pip install torch-geometric

# Install RDKit
conda install -c conda-forge rdkit

# Install other dependencies
pip install -r requirements.txt

Quick Start

Training

# Train on local machine (requires GPU or Apple Silicon)
python scripts/train.py --config configs/default_config.yaml

# Or use the Colab notebook for cloud GPU training
# Open train_on_colab.ipynb in Google Colab

The training script automatically:

  • Loads and preprocesses all 5 MoleculeNet datasets
  • Validates SMILES and builds molecular graphs
  • Trains the dual-stream model with multi-task learning
  • Evaluates on validation set after each epoch
  • Saves best model checkpoint based on validation performance
  • Generates training curves and metrics visualizations

Model Architecture

SMILES Input → CNN Stream (1D Conv layers) →
                                              → Fusion Layer → Multi-Task Heads → Predictions
Molecular Graph → GCN Stream (Graph Conv) →

Training

python scripts/train.py --config configs/default_config.yaml

Evaluation

python scripts/evaluate.py --checkpoint checkpoints/best_model.pt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors