Skip to content

roccomoretti/alphamask

Repository files navigation

AlphaMask

AlphaMask is a tool for analyzing protein sequences through various masking strategies.

Masking Strategies

1. Iterative Masking

Systematically explores sequence positions by masking each position independently, with optional mutation analysis.

graph TD
    A[WT Sequence] --> B[Generate MSA]
    B --> C[Single Position Masking]
    C --> D[Mask Position 1]
    C --> E[Mask Position 2]
    C --> F[Mask Position ...]
    C --> G[Mask Position N]
    
    A --> H[Apply Mutations]
    H --> I[Mutation Set 1<br/>T150A]
    H --> J[Mutation Set 2<br/>L157R]
    H --> K[Mutation Set 3<br/>T150A+L157R]
    
    I --> L[Use WT MSA + Mutate Target]
    J --> L
    K --> L
    
    L --> M[Single Position Masking<br/>with Mutations]
    M --> N[Mask Position 1]
    M --> O[Mask Position 2]
    M --> P[Mask Position ...]
    M --> Q[Mask Position N]
Loading

2. A Priori Masking

Focused experiments on known positions of interest with controlled conditions.

graph TD
    subgraph "A Priori Experiment: Known_position_F21A"
        A[WT Sequence] --> B[Generate MSA]
        
        subgraph "Control Conditions"
            B --> C1[No Mask, No Mutation<br/>Control]
            B --> C2[Mask Position 21<br/>No Mutation]
            B --> C3[No Mask<br/>Mutate F21A]
            B --> C4[Mask Position 21<br/>Mutate F21A]
        end
    end

    subgraph "A Priori Experiment: Double_mutation_study"
        A2[WT Sequence] --> B2[Generate MSA]
        
        subgraph "Control Conditions"
            B2 --> D1[No Mask, No Mutation<br/>Control]
            B2 --> D2[Mask Positions 21,24<br/>No Mutation]
            B2 --> D3[No Mask<br/>Mutate F21A+Y24A]
            B2 --> D4[Mask Positions 21,24<br/>Mutate F21A+Y24A]
        end
    end
Loading

3. Frustra Masking

Analysis-driven approach using protein frustration patterns to identify positions of interest.

graph TD
    A[WT Sequence] --> B[Generate MSA]
    B --> C[Run Frustra Analysis]
    C --> D[Calculate Frustration Scores]
    D --> E[Sort Positions by Score]
    E --> F[Select Top N Positions]
    
    subgraph "Masking Experiments"
        F --> G1[No Mask, No Mutation<br/>Control]
        F --> G2[Mask Top Positions<br/>No Mutation]
        
        G2 --> H1[Position 1 from Top N]
        G2 --> H2[Position 2 from Top N]
        G2 --> H3[Position ... from Top N]
        G2 --> H4[Position N from Top N]
    end
    
    subgraph "Analysis"
        H1 --> I[Compare with Control]
        H2 --> I
        H3 --> I
        H4 --> I
        I --> J[Identify Critical Positions]
    end
Loading

Installation

1. Environment Setup

First, ensure you're on a compute node with GPU access:

# Request an interactive GPU session (adjust parameters according to your cluster)
srun --job-name "alphamask_setup" \
     --gres=gpu:1 \  # Specify GPU requirements for your cluster
     --time 24:00:00 \
     --partition=YOUR_GPU_PARTITION \  # e.g., gpus, gpu, accelerated, etc.
     --pty bash

2. Load Required Modules

# Load CUDA module (version may vary by cluster)
module load cuda  # e.g., cuda/12.6, cuda/11.8, etc.

# Load any additional required modules
module load gcc   # If needed
module load python  # If needed

3. Create Conda Environment

# Using micromamba (recommended)
micromamba create -f environment.yml

# Or using conda
conda env create -f environment.yml

# Activate the environment
micromamba activate alphamask  # or conda activate alphamask

4. Verify Installation

# Check CUDA availability
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"

# Check GPU visibility
nvidia-smi

5. Setup Experiment Directory

# Remove existing experiment folder if needed
rm -rf /path/to/workspace/my_experiments/ 

# Setup experiment folder
python -m alphamask setup --path /path/to/workspace/my_experiments 

6. Running Experiments

# Basic experiment run with all options
alphamask run \
    --path /path/to/workspace/my_experiments \
    --container /path/to/container/vsc-frustra_masking.sif \
    --schema /path/to/workspace/my_experiments/schema/schema_validation.json \
    --config /path/to/workspace/my_experiments/config/proteins.yaml \
    --partitions YOUR_GPU_PARTITION \
    --gpu-types YOUR_GPU_TYPE \
    --time "04:00:00" \
    --memory "20000" \
    --cpus-per-task 1 \
    --alphamask-bin-path ~/.micromamba/envs/alphamask/bin/alphamask \
    --alphamask-mount-path /path/to/alphamask:/opt/alphamask \
    --compress both \
    --compression-level 9 \
    --debug

# Environment configuration options
    --env-manager micromamba \  # Options: conda, mamba, micromamba
    --env-name alphamask \      # Environment name
    --env-base-path ~/.micromamba  # Base path for environments

# Check available partitions and GPU types on your cluster
sinfo -o "%10P %10G %10O %10l %10c"  # For SLURM-based clusters

# Monitor job status
alphamask status \
    --path /path/to/workspace/my_experiments \
    --config /path/to/workspace/my_experiments/config/proteins.yaml \
    --refresh 30  # Updates every 30 seconds

7. Extracting Results

# Extract all PDBs
alphamask extract-pdbs --config config.yaml

# Extract only best predictions
alphamask extract-pdbs --config config.yaml --best-only

# Extract specific models/seeds/recycles
alphamask extract-pdbs --config config.yaml \
    --models model_1 model_2 \
    --seeds 1 2 \
    --recycles 0 1

# Extract for specific proteins
alphamask extract-pdbs --config config.yaml \
    --proteins protein1 protein2 \
    --best-only

Common Cluster-Specific Adjustments

  1. GPU Selection: Different clusters use different GPU naming conventions:

    • Some use specific models (e.g., a100, v100, quadro_rtx_8000)
    • Others use generic names (e.g., gpu:1, gpu:k80:1)
  2. Partition Names: Common variations include:

    • gpu, gpus, accelerated
    • cuda, tesla, nvidia
    • Check your cluster documentation for specific names
  3. Module Names: Module naming conventions vary:

    • CUDA: cuda/12.6, cuda/11.8, nvidia/cuda-12.6
    • Python: python/3.10, python3, anaconda3

Always consult your cluster's documentation or system administrators for specific configuration details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published