Stereogeneration

Studying the effects of including stereisomeric information in generative models for molecules in optimizing stereochemistry-sensitive properties. We perform optimization on (1) rediscovery of R-albuterol and mestranol, (2) protein-ligand docking, and a stereochemistry-specific (3) CD peak spectra score.

Preprint found on ChemRxiv: Stereochemistry-aware string-based molecular generation. Data files are found on Zenodo

Getting started

Initialize a python environment, here we use conda, and install the required packages.

git clone git@github.com:aspuru-guzik-group/stereogeneration.git
cd stereogeneration

conda create -n stereogeneration python=3.8
conda activate stereogeneration
pip install -r requirements.txt

Use of XTB

XTB will be installed in the requirements.txt files. Otherwise, you can install from source from xtb from the Grimme Lab. You can also install using conda. Use the following environment variables:

export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1,1
export OMP_STACKSIZE=4G
ulimit -s unlimited

CD spectra setup

Use of CD spectra task will require stda and xtb4stda from the Grimme Lab. The binary files are found in the stereogeneration/stda directory. The files will have to be made executable, and added to the $PATH variable:

cd stereogeneration/stda
chmod +x g_spec stda_v1.6.3 xtb4stda

# set file paths which will be used by stda
export PATH=$PATH:$PWD
export XTB4STDAHOME=$PWD

Docking setup

Docking requires executable of the smina binary:

chmod +x stereogeneration/docking/smina.static

Running the models

Scripts (main.py) for running each model are found in the respective folders: reinvent, janus, group-janus. The scripts have commandline arguments that control the fitness function task, and some of the parameters of the models.

python main.py \
  --target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol} \    # specify task
  --stereo                                                         # turn on stereo-awareness

Analysis of results

The experiments were repeated 10 times for each model each task. The result files are found in Zenodo. The individual runs for each task are saved in folders {i}_stereo and {i}_nonstereo for $i \in {0,...,9}$. The figures and statistics were generated using the analysis_all.py, which also requires the zinc.csv file (available in Zenodo) to be located in the repo directory:

python analysis_all.py \
  --target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol}
  --root_dir='.'    # where the dataset and `stereogeneration` import are found
  --label='1SYH'    # name for target property label (defaults to 1SYH)
  --horizontal      # toggles horizontal subplots, exclude for vertical subplots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stereogeneration

Getting started

Use of XTB

CD spectra setup

Docking setup

Running the models

Analysis of results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
group-janus-fragments		group-janus-fragments
group-janus		group-janus
janus		janus
reinvent		reinvent
stereogeneration		stereogeneration
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis_all.py		analysis_all.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Stereogeneration

Getting started

Use of XTB

CD spectra setup

Docking setup

Running the models

Analysis of results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages