Skip to content

raulorteg/molminer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MolMiner  —  Towards Controllable, 3D-Aware, Fragment-Based Molecular Design

Python
License
Model Weights
arXiv

MolMiner logo

MolMiner is a multi-property-conditioned, geometry-aware transformer that builds molecules fragment-by-fragment while 3-D aware.
It supports:

  • Order-agnostic roll-outs with symmetry-aware attachment handling
  • Up to 12 simultaneous property conditions (logP, QED, …)
  • End-to-end scripts for vocab extraction, preprocessing, GMM, starter model and full MolMiner training

1. Installation

git clone https://github.com/raulorteg/molminer.git
cd molminer
pip install -r requirements.txt

1.2 Downloading Model Checkpoints

Pre-trained model checkpoints for MolMiner are available via Zenodo.

These checkpoints contain trained weights for: MolMiner model, GMM (Gaussian Mixture Model), Fragment-Starter model.

2. Training

Below is the minimal happy path from raw CSV -> trained MolMiner model. All scripts share the common option --help for full CLI details.

2.1 Build fragment vocabularies

python extract_vocabulary.py \
  --dataset ../data/test/example.csv
# -> vocab_anchors.csv, vocab_attachments.csv, vocab_fragments.csv, stats.json

2.2 Train / validation / test split

python dataset_split.py \
  --dataset ../data/test/example.csv
# -> train.csv, valid.csv, test.csv

2.3 Pre-process for the Fragment-Starter

python preprocess_starter.py \
  --data_dir ../data/test
# -> train_starter.pkl, valid_starter.pkl, test_starter.pkl

2.4 Pre-process for MolMiner

python preprocess_molminer.py \
  --data_dir ../data/test \
  --total_epochs 10 \
  --max_workers 2
# -> steps/test, steps/valid, steps/{epoch}, ...

2.5 Train auxiliary models

stage command

stage command
GMM python train_gmm.py --data_dir ../data/test --model_out ../checkpoints/test_gmm_model.pkl
Fragment-Starter python train_starter.py --data_dir ../data/test --ckpt_dir ../checkpoints

2.6 Train MolMiner

python train_molminer.py \
  --data_dir ../data/test \
  --ckpt_dir ../checkpoints \
  --fixedrollout  \      # remove to use adaptive roll-outs
  --total_epochs 10
# -> ckpt_dir/best_molminer.pth, cpkt_dir/last_molminer/pth

3. Generating

3.1 Create calibration

python postprocess_calibration.py --samples=10 --ckpt_molminer='../checkpoints/best_molminer.pth' --ckpt_starter='../checkpoints/best_starter.pth' --ckpt_gmm='../checkpoints/gmm_model.pkl' --stats_path='../data/zinc/stats.json' --vocab_fragments=
'../data/zinc/vocab_fragments.csv' --vocab_attachments='../data/zinc/vocab_attachments.csv' --vocab_anchors='../data/zinc/vocab_anchors.csv' --device=cpu --weighted
# -> data/calibration/{prop}_calibration.txt

3.2 Generate unconditionally

python generate_random.py --samples=10 --ckpt_molminer='../checkpoints/best_molminer.pth' --ckpt_starter='../checkpoints/best_starter.pth' --ckpt_gmm='../checkpoints/gmm_model.pkl' --stats_path='../data/zinc/stats.json' --vocab_fragments='../data/zinc/vocab_fragments.csv' --vocab_attachments='../data/zinc/vocab_attachments.csv' --vocab_anchors='../data/zinc/vocab_anchors.csv' --device=cpu --weighted
# -> data/generated.txt

4. Analyze results

4.1 Compute statistics on molecules generated unconditionally

python postprocess_generated_statistics.py

4.2 Create the Calibration plot for each of the twelve properties

python postprocess_calibration_plot.py --calibration_dir='../data/calibration' --stats_path='../data/zinc/stats.json' --figure_savepath='../figures/calibration.png'
# -> figures/calibration.png

5. Citation

If you use MolMiner in academic work, please cite:

@misc{ortegaochoa2025molminercontrollable3dawarefragmentbased,
      title={MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design}, 
      author={Raul Ortega-Ochoa and Tejs Vegge and Jes Frellsen},
      year={2025},
      eprint={2411.06608},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.06608}, 
}

6. License

MolMiner is released under the Apache 2.0 License – see LICENSE for details. Contributions are welcome via pull requests or issues!

About

MolMiner, a generative model for fragment-based, 3D-aware, inverse conditional molecular design

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages