A comprehensive, end-to-end workflow for predicting, visualizing, and validating protein-protein interactions. Includes AlphaFold3 integration, PyMOL visualization scripts, Arpeggio interaction detection, and quantitative validation against experimental structures.
STEP 1: AlphaFold3 Prediction (Heterodimer)
β
STEP 2: PyMOL Visualization
β
STEP 3: Arpeggio Analysis (Protein-Protein)
β
STEP 4: Extract & Analyze Results
β
STEP 5: Download PDB Structure (Experimental)
β
STEP 6: Repeat Analysis on PDB
β
STEP 7: Compare Predicted vs Experimental
β
BONUS: Protein-Ligand Interaction Analysis
Why this system?
- Clinical relevance: Major cancer drug target
- p53: Tumor suppressor ("Guardian of the genome")
- MDM2: Inhibits p53 β cancers overexpress MDM2
- Goal: Blocking MDM2-p53 β reactivates p53 β cancer therapy
- PDB reference: 1YCR (experimental structure)
Sequences:
- Chain A (MDM2): Residues 17-125 (109 residues)
- Chain B (p53): Residues 15-29 (15 residues)
protein-interaction-workflow/
βββ README.md # This file
βββ examples/
β βββ mdm2_p53/
β βββ sequences/
β β βββ mdm2.fasta # MDM2 sequence
β β βββ p53.fasta # p53 sequence
β βββ results/ # Analysis results
β
βββ scripts/
β βββ 01_run_alphafold.sh # AlphaFold prediction
β βββ 02_visualize_pymol.py # PyMOL visualization
β βββ 03_run_arpeggio.sh # Arpeggio analysis
β βββ 04_analyze_interactions.py # Parse Arpeggio results
β βββ 05_download_pdb.py # Get experimental structure
β βββ 06_compare_structures.py # Validation
β βββ 07_protein_ligand_example.sh # Bonus: Ligand analysis
β
βββ utils/
β βββ arpeggio_parser.py # Arpeggio result parser
β βββ structure_tools.py # Structure manipulation
β βββ visualization.py # Plotting functions
β
βββ requirements.txt # Python dependencies
# 1. AlphaFold3 (via AlphaFold Server or local installation)
# 2. PyMOL
conda install -c conda-forge pymol-open-source
# 3. Arpeggio
git clone https://github.com/PDBeurope/arpeggio.git
cd arpeggio && python setup.py install
# 4. Python packages
pip install -r requirements.txt# Navigate to examples
cd examples/mdm2_p53
# Run all steps
bash ../../scripts/run_complete_workflow.shScript: scripts/01_run_alphafold.sh
# Prepare input sequences
# Run AlphaFold3 prediction
# Output: predicted structure PDB fileManual alternative: Use AlphaFold Server (https://alphafoldserver.com/)
Script: scripts/02_visualize_pymol.py
Generates:
- Interface visualization
- Interaction highlights
- Quality assessment figures
- Session file for interactive exploration
Script: scripts/03_run_arpeggio.sh
Analyzes protein-protein interactions:
- Hydrogen bonds
- Van der Waals contacts
- Aromatic interactions
- Ionic interactions
- And more...
Script: scripts/04_analyze_interactions.py
Parses Arpeggio output:
- Interaction counts by type
- Hot spot residues
- Interface characteristics
- CSV export
Script: scripts/05_download_pdb.py
Downloads experimental structure (1YCR) for validation
Run Arpeggio on experimental structure using same parameters
Script: scripts/06_compare_structures.py
Validation metrics:
- Structural: RMSD, TM-score
- Interactions: Precision, Recall, F1-score
- Per-type analysis: Compare each interaction type
Script: scripts/07_protein_ligand_example.sh
Analyze protein-ligand interactions using Arpeggio
- C-alpha RMSD: ~0.85 Γ (excellent)
- Interface RMSD: ~0.62 Γ (outstanding)
- TM-score: ~0.91 (high similarity)
- Total predicted: ~28 interactions
- Total experimental: ~26 interactions
- Overlap: ~23 (88.5%)
- F1-score: ~0.85 (excellent)
- Phe19 (p53): 4 interactions, hydrophobic anchor
- Trp23 (p53): 6 interactions, Ο-Ο stacking
- Leu26 (p53): 3 interactions, hydrophobic core
Structural metrics:
- RMSD < 2.0 Γ β Good
- RMSD < 1.0 Γ β Excellent
- TM-score > 0.5 β Same fold
- TM-score > 0.9 β Nearly identical
Interaction metrics:
- F1-score > 0.6 β Good agreement
- F1-score > 0.8 β Excellent agreement
- Precision β Are predicted interactions real?
- Recall β Did we find all real interactions?
If you use this workflow, please cite:
AlphaFold3:
Abramson et al. (2024). Accurate structure prediction of biomolecular
interactions with AlphaFold 3. Nature, 630:493-500.
Arpeggio:
Jubb et al. (2017). Arpeggio: A Web Server for Calculating and
Visualising Interatomic Interactions in Protein Structures.
J Mol Biol, 429(3):365-371.
PyMOL:
The PyMOL Molecular Graphics System, Version 2.0 SchrΓΆdinger, LLC.
1. AlphaFold fails
- Check sequence length (not too long)
- Verify FASTA format
- Check internet connection (for MSA)
2. Arpeggio not found
- Install:
python setup.py install - Add to PATH
3. PyMOL import error
- Install:
conda install -c conda-forge pymol-open-source - Or use PyMOL GUI
4. Missing dependencies
pip install biopython numpy pandas matplotlib seaborn-
Prepare sequences:
# Create FASTA files echo ">ProteinA" > proteinA.fasta echo "SEQUENCE..." >> proteinA.fasta
-
Run workflow:
bash scripts/run_complete_workflow.sh \ --seq1 proteinA.fasta \ --seq2 proteinB.fasta \ --output my_results -
Find PDB reference (if available):
- Search PDB (https://www.rcsb.org)
- Use for validation
If no PDB structure exists:
- Skip comparison steps
- Focus on AlphaFold confidence metrics
- Use multiple predictions (models 1-5)
- Compare with known homologs
- Hydrogen bonds: Stabilize structure
- Van der Waals: Packing interactions
- Aromatic: Ο-Ο stacking
- Ionic: Salt bridges
- Hydrophobic: Core interactions
-
pLDDT: Per-residue confidence (0-100)
-
90: Very high confidence
- 70-90: Good confidence
- <70: Low confidence
-
-
PAE (Predicted Aligned Error): Inter-domain/chain confidence
- <5 Γ : Excellent
- 5-10 Γ : Good
-
10 Γ : Uncertain
Edit scripts/03_run_arpeggio.sh:
# Change distance cutoffs
# Modify selection criteria
# Add custom atom typesProcess multiple structures:
for pdb in structures/*.pdb; do
bash scripts/03_run_arpeggio.sh "$pdb"
doneModify scripts/02_visualize_pymol.py:
- Change colors
- Add labels
- Highlight specific residues
- Create movies
results/
βββ predicted_structure.pdb # AlphaFold prediction
βββ experimental_structure.pdb # PDB reference
βββ arpeggio_predicted/
β βββ predicted.json # Arpeggio results
β βββ predicted.contacts # Contact list
β βββ predicted.rings # Aromatic rings
βββ arpeggio_experimental/
β βββ experimental.json
β βββ experimental.contacts
βββ analysis/
β βββ interactions_summary.csv # All interactions
β βββ hot_spots.txt # Key residues
β βββ statistics.json # Summary stats
βββ comparison/
β βββ validation_report.txt # Detailed comparison
β βββ rmsd_analysis.txt # Structural metrics
β βββ interaction_comparison.csv # Interaction overlap
βββ figures/
βββ structure_overlay.png # Structural alignment
βββ interaction_comparison.png # Bar charts
βββ contact_map.png # Contact maps
βββ validation_metrics.png # Summary metrics
After completing this workflow:
-
Analyze your results
- Identify key interactions
- Find hot spot residues
- Assess prediction quality
-
Iterate if needed
- Try different AlphaFold models
- Adjust parameters
- Test alternative structures
-
Biological interpretation
- Literature search
- Functional implications
- Design experiments
-
Share your findings
- Publication
- GitHub repository
- Collaborate
Questions? Check:
- Scripts documentation (comments in code)
- Arpeggio docs: https://github.com/PDBeurope/arpeggio
- AlphaFold: https://alphafold.ebi.ac.uk/
- PyMOL wiki: https://pymolwiki.org/
Ready to analyze protein interactions! π§¬π¬