RBFE-Snakemake-Workflow

A snakemake based RBFE end-to-end workflow using BioSimSpace and sire with modular minimisation and equilibration staging and support for multiple engines.

The included example is a 3-ligand subset of the tyk2 dataset:

with minimisation/equilibration based on the protocol described in [Daniel R. Roe, Bernard R. Brooks; A protocol for preparing explicitly solvated systems for stable molecular dynamics simulations. J. Chem. Phys. 7 August 2020; 153 (5): 054123](https://doi.org/10.1063/5.0013849). The full rulegraph and DAG of the workflow are:

Running the workflow

First, ensure that you have a conda environment with all required packages installed. An example environment is included in workflow/envs/RBFE_workflow_environment.yml. This can be used to create an environment by running

mamba create env --file workflow/envs/RBFE_workflow_environment.yml
mamba activate openbiosim

Once this environment has been created the workflow can be run in a few ways. To run locally use the command

snakemake --cores all --resources gpu=1 {extra options}

See the snakemake docs for a full list of available command line options.

Note that this assumes that only a single GPU is available on your machine. If you have more than one GPU available the --resources gpu=X can be scaled up, however this workflow does not contain any internal GPU management, meaning that all simulations will end up being run on GPU0.

For multi-GPU machines or clusters this workflow should be run via SLURM, with accounting enabled. In order to do this an additional package is required: https://github.com/snakemake/snakemake-executor-plugin-slurm. In addition to this, it is advised to have a SLURM-specific profile, which defines the per-job resources. A basic profile is included in profiles/config.yaml, but this will need to be updated for your machine-specific requirements.

Once this is all setup the workflow can be run using something like

snakemake --executor slurm --profile profiles --use-conda --jobs 100 --resources gpu={max number GPUs} {extra options}

with jobs and --resources gpu defining the maximum number of jobs that snakemake will place in the SLURM queue at any given time.

Resource Management

Managing resources within molecular dynamics workflows is hardware and engine dependent, and should be adjusted on a machine-to-machine basis. Within this workflow some options are provided - namely the simulation_threads config option and the cpus_per_gpu option within profiles/config.yaml.

The simulation_threads option defines the number of threads assigned to each rule which involves running one or more simulations. This option can have a large impact on performance, especially when using Gromacs, and as such careful consideration should be given to finding an optimal value for your specific hardware. The cpus_per_gpu option is specific to SLURM and should be set to a value greater than or equal to simulation_threads when running the workflow using SLURM.

Configuring the workflow

All settings for each stage of this workflow are defined within config/config.yaml, this includes settings for network preparation, solvation, minimisation, equilibration and production. Full descriptions of each available setting can be found in doc/workflow_settings.md.

Network Preparation

All ligand folders should be placed in inputs/ligands, these can be any BioSimSpace-supported format, and currently are assumed to NOT be parameterised. The protein of interest should be placed in the inputs/protein folder, and is assumed to be parameterised. All ligands should be in their correct docking pose.

The workflow includes rules and scripts for the preparation of the network, namely scripts/prepare_network.py. This script is part of the workflow, and will be run as the first step within it. If you wish to run this script in isolation in order to fine-tune your network then it can be run prior to running the workflow - if a folder called network is detected, and has a file named network.dat within it, then the prep_network rule will be skipped by the workflow. If you wish to create your own network then simply create the network folder and include an appropriately formatted network.dat file.

If network preparation is run as part of the workflow settings can be defined in the network_settings portion of the config.

Ligand Preparation

The second step in the workflow is the preparation of the free and bound systems. Included in this step is ligand parameterisation, creation of the protein-ligand bound system and solvation, defined by the settings given within config.yaml.

Minimisation and Equilibration

Minimisation and equilibration steps can be added and removed according to user requirements, and will execute in the order defined in the config file, regardless of naming. Take for example this simple two-step protocol:

if additional equilibration is needed, for instance at a higher timestep or in a different regime, additional steps can be added to the config and will subsequently become part of the workflow. Here we've added an additional NPT step to the bound leg equilibration:

FEP preparation

Once all systems have been created and end states minimised and equilibrated, the next step is to create the merged systems. This takes place in the fep_prep step. Each perturbation defined within the network/network.dat will be generated for both free and bound legs.

Settings for this stage (outside of those defined in network.dat) are taken directly from the production-settings section of config.yaml.

Production

Now that everything is prepared production can be run, with all settings for production runs defined by the production-settings section of config.yaml, and most of these settings can be defined independently for bound and free legs. Each perturbation defined within network/network.dat will be run num-replicas times for both bound and free legs.

Analysis

Once production runs are finished, automated analysis will be run on all successful legs. This analysis can be configured within the analysis-settings section of config.yaml. By default the workflow will attempt to use openFE-cinnabar for network plotting, but if this is not found it will default to a native method.

Note that any experimental data is assumed to be given in the same layout as the provided example, that is 3 columns: ligand, experimental DG, experimental error, and that experimental values must be given in one of three forms: kcal/mol, kj/mol, Ki(μM).

Results for the example cycle, run using the SOMD engine are as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.img		.img
config		config
doc		doc
profiles		profiles
workflow		workflow
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RBFE-Snakemake-Workflow

Running the workflow

Resource Management

Configuring the workflow

Network Preparation

Ligand Preparation

Minimisation and Equilibration

FEP preparation

Production

Analysis

About

Uh oh!

Releases

Packages

Languages

License

OpenBioSim/rbfe_workflow

Folders and files

Latest commit

History

Repository files navigation

RBFE-Snakemake-Workflow

Running the workflow

Resource Management

Configuring the workflow

Network Preparation

Ligand Preparation

Minimisation and Equilibration

FEP preparation

Production

Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages