-
Notifications
You must be signed in to change notification settings - Fork 0
1. Introduction
PIA - short for Protein Interaction Analyzer - is a tool for automatic identification of important interactions and interaction-frequency-based scoring in protein-ligand complexes. PIA was developed as part of the master's thesis "Automatic identification of important interactions and interaction-frequency-based scoring in protein-ligand complexes" in cooperation with the Institute of Pharmacy of the Paracelsus Medical Private University Salzburg.
PIA aims to provide insight for three major objectives:
- To identify interactions between a target protein and its possible ligands that are important for binding by looking at the frequencies of the interactions taking place between the target and its ligands.
- To define a scoring function based on the interaction frequencies to assess if a protein-ligand complex is active (ligand successfully inhibits or activates the target protein) or not.
- To subsequently apply the new found scoring function to new and unseen protein-ligand complexes to determine if they are active or not and therefore gaining information if they are usefull for further research.
More information on how these approaches are conceptually realized can be found in the thesis in the section "Methods - 2.3 PIA".
PIA is a python package that largely builds upon BioPandas and RDKit for chemical structure manipulation and PLIP for the detection and characterization of protein-ligand interactions as well as many other python packages. Although PIA is completely implemented in python it still requires installation of some C/C++ libraries that are needed for its dependencies. PIA therefore comes with an environment.yml
file to setup a conda environment or alternatively as a Docker image that is available via michabirklbauer/pia:latest
. More information on how to install PIA can be found in the corresponding chapter.
Abstract: Molecular docking is an important tool in virtual screening for the discovery and design of new active agents for drug usage. The docking process is influenced by how well molecules fit in the binding site and which interactions occur between the protein and the ligand. Detection of these interactions can be automated with tools like the Protein-Ligand Interaction Profiler (PLIP) by PharmAI. However, identification and assessment of the importance of the different interactions in a protein-ligand complex is still a manual task that requires additional experimental data or domain knowledge about the target. The goals of this thesis are twofold: Firstly, to automatically identify those interactions that have a significant influence on ligand binding, and secondly, to develop a novel scoring function which is able to discriminate active molecules from inactive ones if possible. The underlying data basis were selected targets of the Directory of Useful Decoys: Enhanced (DUD-E) and available structures from the Protein Data Bank (PDB). Specifically 11 targets were analysed: 11-Beta-Hydroxysteroid Dehydrogenase 1 (HSD11B1), Acetylcholinesterase (ACHE), Coagulation Factor XA (FXA), Cyclooxygenase 1 and 2 (COX1/COX2), Dipeptidyl Peptidase IV (DPP4), Monoamine Oxidase B (MAOB), P38 Mitogen-Activated Protein Kinase 14 (MAPK14), Phosphodiesterase 5 (PDE5A), Protein-Tyrosine Phosphatase 1B (PTP1B) and Soluble Epoxide Hydrolase (SEH). PLIP is used to extract interactions present in a protein-ligand complex and the respective interaction’s frequency is measured across all target structures. Cofactors were excluded from the analysis and hydrophobic interactions were only counted once per residue. Additionally, when analysing docking poses only the pose that had the most interactions contributed to the calculation. Furthermore, four different scoring functions that are based on the differences in frequencies between active and inactive compounds were established and their performance was assessed on an independent test partition containing unseen ligands. The results show that interactions which are known from literature to be important for ligand binding are found for all targets except ACHE, in many cases among the top ranked interactions in terms of frequency. This behaviour implies a relationship between interaction frequency and the interaction’s significance in ligand binding. Interaction-frequency-based scoring was tested in five targets and performed above baseline accuracy in four of the five targets. In all targets scoring led to an enrichment of active compounds and false positive rates fluctuated between 0 and 33%. Interaction frequency analysis and interaction-frequency-based scoring could therefore be used as supporting tools in virtual screening to further enhance results.
The full thesis can be read here.
As mentioned above PIA is a python package and resembles the core of both PIAScript and PIAWeb. PIA provides functions for reading files in various formats like PDB and SDF, extracting interactions and their frequencies across multiple protein-ligand complexes, calculating scoring and classification models from interaction frequencies and classifying new compounds.
If you want to use PIA directly in your projects some python knowledge is strongly recommended!
Extensive documentation of PIA can be found here: PIA - Wiki.
PIAScript is python script with predefined workflows to execute basic tasks with PIA. PIAScript can be run from the commandline without any programming knowledge.
An exhaustive list of available workflows, usage and documenation can be found here: PIA - Wiki
PIAWeb is a graphical wrapper for PIAScript and allows execution of workflows without the need of using the commandline. PIAWeb runs a Streamlit webserver and the graphical user interface can be accessed via any webbrowser.
Documentation for PIAWeb can be found here: PIAWeb - Wiki
The following table should highlight possible use cases for the different implementations of PIA:
Use case | PIA | PIAScript | PIAWeb |
---|---|---|---|
I want to run a predefined workflow | ✅ | ✅ | |
I want to run a predefined workflow using a GUI | ✅ | ||
I want to build my own workflows | ✅ | ||
I want to incorporate PIA into my own pipelines | ✅ | ||
I want to host PIA on my own server to quickly run analysis | ✅ | ||
I want to analyze 10 000+ structures | ✅ | ✅ | 🟡* |
*Possible but not recommended!
Stuck during setting up PIA, a workflow doesn't run as intended, or found a bug? Please open an Issue, a Discussion or contact us directly!
- Mail - Micha Birklbauer: [email protected]