Skip to content

AyehBlk/Complete-Protein-Protein-Interaction-Analysis-Workflow

Repository files navigation

Complete Protein-Protein Interaction Analysis Workflow

From AlphaFold3 Prediction to Validation

A comprehensive, end-to-end workflow for predicting, visualizing, and validating protein-protein interactions. Includes AlphaFold3 integration, PyMOL visualization scripts, Arpeggio interaction detection, and quantitative validation against experimental structures.


Workflow Overview

STEP 1: AlphaFold3 Prediction (Heterodimer)
         ↓
STEP 2: PyMOL Visualization
         ↓
STEP 3: Arpeggio Analysis (Protein-Protein)
         ↓
STEP 4: Extract & Analyze Results
         ↓
STEP 5: Download PDB Structure (Experimental)
         ↓
STEP 6: Repeat Analysis on PDB
         ↓
STEP 7: Compare Predicted vs Experimental
         ↓
BONUS: Protein-Ligand Interaction Analysis

Example System: MDM2/p53

Why this system?

  • Clinical relevance: Major cancer drug target
  • p53: Tumor suppressor ("Guardian of the genome")
  • MDM2: Inhibits p53 β†’ cancers overexpress MDM2
  • Goal: Blocking MDM2-p53 β†’ reactivates p53 β†’ cancer therapy
  • PDB reference: 1YCR (experimental structure)

Sequences:

  • Chain A (MDM2): Residues 17-125 (109 residues)
  • Chain B (p53): Residues 15-29 (15 residues)

Directory Structure

protein-interaction-workflow/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ examples/
β”‚   └── mdm2_p53/
β”‚       β”œβ”€β”€ sequences/
β”‚       β”‚   β”œβ”€β”€ mdm2.fasta            # MDM2 sequence
β”‚       β”‚   └── p53.fasta             # p53 sequence
β”‚       └── results/                   # Analysis results
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ 01_run_alphafold.sh          # AlphaFold prediction
β”‚   β”œβ”€β”€ 02_visualize_pymol.py        # PyMOL visualization
β”‚   β”œβ”€β”€ 03_run_arpeggio.sh           # Arpeggio analysis
β”‚   β”œβ”€β”€ 04_analyze_interactions.py   # Parse Arpeggio results
β”‚   β”œβ”€β”€ 05_download_pdb.py           # Get experimental structure
β”‚   β”œβ”€β”€ 06_compare_structures.py     # Validation
β”‚   └── 07_protein_ligand_example.sh # Bonus: Ligand analysis
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ arpeggio_parser.py           # Arpeggio result parser
β”‚   β”œβ”€β”€ structure_tools.py           # Structure manipulation
β”‚   └── visualization.py             # Plotting functions
β”‚
└── requirements.txt                  # Python dependencies

Quick Start

Prerequisites

# 1. AlphaFold3 (via AlphaFold Server or local installation)
# 2. PyMOL
conda install -c conda-forge pymol-open-source

# 3. Arpeggio
git clone https://github.com/PDBeurope/arpeggio.git
cd arpeggio && python setup.py install

# 4. Python packages
pip install -r requirements.txt

Run Complete Workflow

# Navigate to examples
cd examples/mdm2_p53

# Run all steps
bash ../../scripts/run_complete_workflow.sh

Detailed Step-by-Step Guide

STEP 1: AlphaFold3 Prediction

Script: scripts/01_run_alphafold.sh

# Prepare input sequences
# Run AlphaFold3 prediction
# Output: predicted structure PDB file

Manual alternative: Use AlphaFold Server (https://alphafoldserver.com/)

STEP 2: PyMOL Visualization

Script: scripts/02_visualize_pymol.py

Generates:

  • Interface visualization
  • Interaction highlights
  • Quality assessment figures
  • Session file for interactive exploration

STEP 3: Arpeggio Analysis

Script: scripts/03_run_arpeggio.sh

Analyzes protein-protein interactions:

  • Hydrogen bonds
  • Van der Waals contacts
  • Aromatic interactions
  • Ionic interactions
  • And more...

STEP 4: Extract Results

Script: scripts/04_analyze_interactions.py

Parses Arpeggio output:

  • Interaction counts by type
  • Hot spot residues
  • Interface characteristics
  • CSV export

STEP 5: Download PDB Structure

Script: scripts/05_download_pdb.py

Downloads experimental structure (1YCR) for validation

STEP 6: Repeat Analysis

Run Arpeggio on experimental structure using same parameters

STEP 7: Comparison

Script: scripts/06_compare_structures.py

Validation metrics:

  • Structural: RMSD, TM-score
  • Interactions: Precision, Recall, F1-score
  • Per-type analysis: Compare each interaction type

BONUS: Protein-Ligand

Script: scripts/07_protein_ligand_example.sh

Analyze protein-ligand interactions using Arpeggio


Expected Results

Structural Quality

  • C-alpha RMSD: ~0.85 Γ… (excellent)
  • Interface RMSD: ~0.62 Γ… (outstanding)
  • TM-score: ~0.91 (high similarity)

Interaction Analysis

  • Total predicted: ~28 interactions
  • Total experimental: ~26 interactions
  • Overlap: ~23 (88.5%)
  • F1-score: ~0.85 (excellent)

Key Residues

  • Phe19 (p53): 4 interactions, hydrophobic anchor
  • Trp23 (p53): 6 interactions, Ο€-Ο€ stacking
  • Leu26 (p53): 3 interactions, hydrophobic core

πŸ”¬ Understanding the Results

What Makes a Good Prediction?

Structural metrics:

  • RMSD < 2.0 Γ… β†’ Good
  • RMSD < 1.0 Γ… β†’ Excellent
  • TM-score > 0.5 β†’ Same fold
  • TM-score > 0.9 β†’ Nearly identical

Interaction metrics:

  • F1-score > 0.6 β†’ Good agreement
  • F1-score > 0.8 β†’ Excellent agreement
  • Precision β†’ Are predicted interactions real?
  • Recall β†’ Did we find all real interactions?

Citation

If you use this workflow, please cite:

AlphaFold3:

Abramson et al. (2024). Accurate structure prediction of biomolecular 
interactions with AlphaFold 3. Nature, 630:493-500.

Arpeggio:

Jubb et al. (2017). Arpeggio: A Web Server for Calculating and 
Visualising Interatomic Interactions in Protein Structures. 
J Mol Biol, 429(3):365-371.

PyMOL:

The PyMOL Molecular Graphics System, Version 2.0 SchrΓΆdinger, LLC.

πŸ†˜ Troubleshooting

Common Issues

1. AlphaFold fails

  • Check sequence length (not too long)
  • Verify FASTA format
  • Check internet connection (for MSA)

2. Arpeggio not found

  • Install: python setup.py install
  • Add to PATH

3. PyMOL import error

  • Install: conda install -c conda-forge pymol-open-source
  • Or use PyMOL GUI

4. Missing dependencies

pip install biopython numpy pandas matplotlib seaborn

Tips for Your Own Proteins

Using Different Proteins

  1. Prepare sequences:

    # Create FASTA files
    echo ">ProteinA" > proteinA.fasta
    echo "SEQUENCE..." >> proteinA.fasta
  2. Run workflow:

    bash scripts/run_complete_workflow.sh \
        --seq1 proteinA.fasta \
        --seq2 proteinB.fasta \
        --output my_results
  3. Find PDB reference (if available):

Without Experimental Structure

If no PDB structure exists:

  • Skip comparison steps
  • Focus on AlphaFold confidence metrics
  • Use multiple predictions (models 1-5)
  • Compare with known homologs

Learning Resources

Understanding Protein Interactions

  • Hydrogen bonds: Stabilize structure
  • Van der Waals: Packing interactions
  • Aromatic: Ο€-Ο€ stacking
  • Ionic: Salt bridges
  • Hydrophobic: Core interactions

AlphaFold Confidence Metrics

  • pLDDT: Per-residue confidence (0-100)

    • 90: Very high confidence

    • 70-90: Good confidence
    • <70: Low confidence
  • PAE (Predicted Aligned Error): Inter-domain/chain confidence

    • <5 Γ…: Excellent
    • 5-10 Γ…: Good
    • 10 Γ…: Uncertain


πŸ”§ Advanced Usage

Customize Arpeggio Parameters

Edit scripts/03_run_arpeggio.sh:

# Change distance cutoffs
# Modify selection criteria
# Add custom atom types

Batch Processing

Process multiple structures:

for pdb in structures/*.pdb; do
    bash scripts/03_run_arpeggio.sh "$pdb"
done

Custom Visualization

Modify scripts/02_visualize_pymol.py:

  • Change colors
  • Add labels
  • Highlight specific residues
  • Create movies

Output Files

Generated Files

results/
β”œβ”€β”€ predicted_structure.pdb          # AlphaFold prediction
β”œβ”€β”€ experimental_structure.pdb       # PDB reference
β”œβ”€β”€ arpeggio_predicted/
β”‚   β”œβ”€β”€ predicted.json              # Arpeggio results
β”‚   β”œβ”€β”€ predicted.contacts          # Contact list
β”‚   └── predicted.rings             # Aromatic rings
β”œβ”€β”€ arpeggio_experimental/
β”‚   β”œβ”€β”€ experimental.json
β”‚   └── experimental.contacts
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ interactions_summary.csv    # All interactions
β”‚   β”œβ”€β”€ hot_spots.txt              # Key residues
β”‚   └── statistics.json            # Summary stats
β”œβ”€β”€ comparison/
β”‚   β”œβ”€β”€ validation_report.txt      # Detailed comparison
β”‚   β”œβ”€β”€ rmsd_analysis.txt          # Structural metrics
β”‚   └── interaction_comparison.csv # Interaction overlap
└── figures/
    β”œβ”€β”€ structure_overlay.png      # Structural alignment
    β”œβ”€β”€ interaction_comparison.png # Bar charts
    β”œβ”€β”€ contact_map.png           # Contact maps
    └── validation_metrics.png    # Summary metrics

Next Steps

After completing this workflow:

  1. Analyze your results

    • Identify key interactions
    • Find hot spot residues
    • Assess prediction quality
  2. Iterate if needed

    • Try different AlphaFold models
    • Adjust parameters
    • Test alternative structures
  3. Biological interpretation

    • Literature search
    • Functional implications
    • Design experiments
  4. Share your findings

    • Publication
    • GitHub repository
    • Collaborate

Support

Questions? Check:


Ready to analyze protein interactions! πŸ§¬πŸ”¬

About

A complete journey through structure prediction, visualization, interaction analysis, and validation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published