Studying the effects of including stereisomeric information in generative models for molecules in optimizing stereochemistry-sensitive properties. We perform optimization on (1) rediscovery of R-albuterol and mestranol, (2) protein-ligand docking, and a stereochemistry-specific (3) CD peak spectra score.
Preprint found on ChemRxiv: Stereochemistry-aware string-based molecular generation. Data files are found on Zenodo
Initialize a python environment, here we use conda, and install the required packages.
git clone git@github.com:aspuru-guzik-group/stereogeneration.git
cd stereogeneration
conda create -n stereogeneration python=3.8
conda activate stereogeneration
pip install -r requirements.txtXTB will be installed in the requirements.txt files. Otherwise, you can install from source from xtb from the Grimme Lab. You can also install using conda. Use the following environment variables:
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1,1
export OMP_STACKSIZE=4G
ulimit -s unlimitedUse of CD spectra task will require stda and xtb4stda from the Grimme Lab. The binary files are found in the stereogeneration/stda directory. The files will have to be made executable, and added to the $PATH variable:
cd stereogeneration/stda
chmod +x g_spec stda_v1.6.3 xtb4stda
# set file paths which will be used by stda
export PATH=$PATH:$PWD
export XTB4STDAHOME=$PWDDocking requires executable of the smina binary:
chmod +x stereogeneration/docking/smina.staticScripts (main.py) for running each model are found in the respective folders: reinvent, janus, group-janus. The scripts have commandline arguments that control the fitness function task, and some of the parameters of the models.
python main.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol} \ # specify task
--stereo # turn on stereo-awarenessThe experiments were repeated 10 times for each model each task. The result files are found in Zenodo. The individual runs for each task are saved in folders {i}_stereo and {i}_nonstereo for analysis_all.py, which also requires the zinc.csv file (available in Zenodo) to be located in the repo directory:
python analysis_all.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol}
--root_dir='.' # where the dataset and `stereogeneration` import are found
--label='1SYH' # name for target property label (defaults to 1SYH)
--horizontal # toggles horizontal subplots, exclude for vertical subplots