Skip to content

szczurek-lab/protein-gifs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protein-gifs

Generate GIF and PNG visualizations of protein structures across steering interventions and evolutionary trajectories.

Setup

1. Create the conda environment

conda env create -f environment.yml
conda activate protein-gifs

Or manually:

conda create -n protein-gifs python=3.10 -c conda-forge -y
conda activate protein-gifs
conda install -c conda-forge pymol-open-source pillow numpy font-ttf-dejavu-sans-mono -y

2. Install the package

pip install -e .

3. Install xvfb (for headless/server rendering)

sudo apt-get install xvfb

4. Verify

protein-gifs info

Quick start

Option A: Python API

from protein_gifs import GifTask, render_gif

task = GifTask(
    structure_files=["step0.pdb", "step1.pdb", "step2.pdb"],
    output_path="animation.gif",
    titles=["Start", "Middle", "End"],
    render_style="cartoon",
)
render_gif(task)

Option B: Command line

# Lambda-sweep experiments
protein-gifs sweep \
    --base-dir ./sasa_big_sweep \
    --output-dir ./gifs \
    --styles cartoon,cartoon_sasa \
    --designs 1-50

# Evolutionary trajectories
protein-gifs evolution \
    --base-dir ./PeptideEvolution \
    --output-dir ./gifs \
    --comparisons

Render styles

Style Representation Coloring
cartoon Ribbon Secondary structure (helix/sheet/loop)
cartoon_sasa Ribbon SASA (blue=buried → white → red=exposed)
surface Molecular surface Secondary structure
surface_sasa Molecular surface SASA
surface_hydro Molecular surface Amino acid hydrophobicity

Color schemes for SS coloring

  • default: helix=red, sheet=yellow, loop=green
  • pastel: helix=hotpink, sheet=cyan, loop=gray

Project-specific workflows

FoldSAE (lambda sweeps)

Expected layout:

base_dir/
├── lambda_-1.0_thr_0.5_high/
│   ├── design_1.pdb
│   ├── design_2.pdb
│   └── frame_wise_sasa_scores/
│       ├── design_1.pdb.csv
│       └── design_2.pdb.csv
├── lambda_-0.95_thr_0.5_high/
│   └── ...
└── lambda_1.0_thr_0.5_high/
    └── ...
from protein_gifs.collectors import SweepCollector

collector = SweepCollector(
    base_dir="./sasa_big_sweep",
    sasa_subdir="frame_wise_sasa_scores",
)
tasks = collector.collect_tasks(
    design_nums=range(1, 51),
    output_dir="./gifs",
    render_styles=["cartoon", "cartoon_sasa"],
)

PeptideEvolution (generation trajectories)

Expected layout:

base_dir/
├── structures_apex/
│   ├── KFWKLLKKALRLWAKVL/
│   │   ├── KFWKLLKKALRLWAKVL_gen_init.cif
│   │   └── KKTRLVIKGLRIWIAKL_gen_end.cif
│   └── .../
├── structures_deep_amp/
│   └── .../
└── boltz_structures_custom_md_cmaes/
    ├── KFWKLLKKALRLWAKVL/
    │   ├── KFWKLLKKALRLWAKVL_gen_0.cif
    │   ├── KFWKLLKKALRLWAKVL_gen_1.cif
    │   └── ...gen_51.cif
    └── .../
from protein_gifs.collectors import EvolutionCollector

collector = EvolutionCollector(
    base_dir="./PeptideEvolution",
    datasets=["structures_apex", "structures_deep_amp"],
)
tasks = collector.collect_tasks(output_dir="./gifs")

NaN validation

Structure prediction sometimes produces files with NaN coordinates. Validate and filter them:

protein-gifs validate --base-dir ./structures --output validation.json

Then pass the log to any collector:

from protein_gifs import load_nan_set

nan_set = load_nan_set("validation.json")
tasks = collector.collect_tasks(..., nan_set=nan_set)

Architecture

protein_gifs/
├── core/
│   ├── task.py          # GifTask and ComparisonTask dataclasses
│   ├── runner.py        # render_gif() and render_comparison()
│   ├── subprocess.py    # PyMOL subprocess management
│   ├── validation.py    # NaN detection and filtering
│   └── workers/         # Scripts that run inside PyMOL subprocesses
│       ├── gif_worker.py
│       └── comparison_worker.py
├── collectors/
│   ├── sweep.py         # Lambda-sweep file discovery
│   └── evolution.py     # Generation-trajectory file discovery
└── cli.py               # Command-line interface

Why subprocesses? PyMOL can segfault on malformed structures and leaks memory across many rendering calls. Each GIF is rendered in an isolated subprocess so failures don't crash the batch and memory is reclaimed between tasks.

Adapting for new projects

To support a new directory layout, you have two options:

  1. Use GifTask directly — just list your files and call render_gif().
  2. Write a custom collector — implement file discovery logic that produces GifTask objects. See collectors/sweep.py for a reference.

Running on a SLURM cluster

#!/bin/bash
#SBATCH --job-name=protein-gifs
#SBATCH --output=protein-gifs_%j.out
#SBATCH --time=4:00:00
#SBATCH --mem=16G
#SBATCH --cpus-per-task=4

source ~/miniconda3/etc/profile.d/conda.sh
conda activate protein-gifs
protein-gifs sweep --base-dir ./structures --output-dir ./gifs --designs 1-50

About

Pipeline for generating animated GIF visualizations of protein structures using PyMOL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages