Aims: Model interactions between Lassa virus L protein and Z matrix protein; Determine the oligomer state of E.coli Single-stranded DNA-binding protein (SSB)
Firstly, download sequences of L(Uniprot: O09705) and Z(uniprot:O73557) proteins. The result is example_data/example_2_sequences.fasta
Now run:
create_individual_features.py \
--fasta_paths=example_2_sequences.fasta \
--data_dir=<path to alphafold databases> \
--save_msa_files=False \
--output_dir=<dir to save the output objects> \
--use_precomputed_msas=False \
--max_template_date=<any date you want> \
--skip_existing=False --seq_index=<any number you want>create_individual_features.py will compute necessary features for O73557 and O09705 then store them as individual pickle files in the output_dir. Please be aware that in the fasta files, everything after > will be
taken as the description of the protein and please be aware that any special symbol, such as | : ; #, after > will be replaced with _.
The name of the pickles will be the same as the descriptions of the sequences in fasta files (e.g. ">protein_A" in the fasta file will yield "protein_A.pkl")
See Example 1
We want to predict the structure of full-length L protein together with Z protein. However, as the L protein is very long, many users would not have a GPU card with sufficient memory. Moreover, when attempting modeling the full L-Z, the resulting model does not match the known cryo-EM structure. In Example 1, we showed how to use AlphaPulldown to find the interaction site by screening fragments using the pullldown mode. Here, to demonstrate the custom mode, we will assume the we know the interaction site and model the fragment using this mode, as demonstrated in the figure below
:
Different proteins are seperated by ;. If a particular region is wanted from one protein, simply add , after that protein and followed by the region. Region comes in the format of number1-number2. An example input file is: example_data/cutom_mode.txt
The command line interface for using custom mode will then become:
run_multimer_jobs.py \
--mode=custom \
--num_cycle=3 \
--num_predictions_per_model=1 \
--output_path=<path to output directory> \
--data_dir=<path to AlphaFold data directory> \
--protein_lists=custom_mode.txt \
--monomer_objects_dir=/path/to/monomer_objects_directory \
--job_index=<any number you want>
On a compute cluster, you may want to run all jobs in parallel as a job array. For example, on SLURM queuing system at EMBL we could use the following run_multimer_jobs_SLURM.sh sbatch script:
#!/bin/bash
#A typical run takes couple of hours but may be much longer
#SBATCH --job-name=array
#SBATCH --time=2-00:00:00
#log files:
#SBATCH -e logs/run_multimer_jobs_%A_%a_err.txt
#SBATCH -o logs/run_multimer_jobs_%A_%a_out.txt
#qos sets priority
#SBATCH --qos=low
#SBATCH -p gpu-el8
#You might want to use a higher-end card in case higher oligomeric state get big:
#SBATCH -C "gpu=A40|gpu=A100"
#Reserve the entire GPU so no-one else slows you down
#SBATCH --gres=gpu:1
#Limit the run to a single node
#SBATCH -N 1
#Adjust this depending on the node
#SBATCH --ntasks=8
#SBATCH --mem=128000
module load Anaconda3
module load CUDA/11.3.1
module load cuDNN/8.2.1.32-CUDA-11.3.1
source activate AlphaPulldown
MAXRAM=$(echo `ulimit -m` '/ 1024.0'|bc)
GPUMEM=`nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits|tail -1`
export XLA_PYTHON_CLIENT_MEM_FRACTION=`echo "scale=3;$MAXRAM / $GPUMEM"|bc`
export TF_FORCE_UNIFIED_MEMORY='1'
run_multimer_jobs.py \
--mode=custom \
--num_cycle=3 \
--num_predictions_per_model=1 \
--output_path=<path to output directory> \
--data_dir=<path to AlphaFold data directory> \
--protein_lists=custom_mode.txt \
--monomer_objects_dir=/path/to/monomer_objects_directory \
--job_index=$SLURM_ARRAY_TASK_ID and then run using:
mkdir -p logs
count=`grep -c "" custom_mode.txt` #count lines even if the last one has no end of line
sbatch --array=1-$count run_multimer_jobs_SLURM.sh
This taks is to determine the oligomer state of SSB protein (Uniprot:P0AGE0) by modelling its monomeric, homodimeric, homotrimeric, and homoquatrameric structures. Thus, homo-oligomer mode is needed. An oligomer state file will tell the programme the number of units. An example is: example_data/example_oligomer_state_file.txt
In the file, oligomeric states of the corresponding proteins should be separated by , e.g. protein_A,3means a homotrimer for protein_A

Instead of homo-oligomers, this mode can also be used to predict monomeric structure by simply adding 1 or nothing after the protein.
The command for homo-oligomer mode is:
run_multimer_jobs.py \
--mode=homo-oligomer \
--output_path=<path to output directory> \
--num_cycle=3 \
--oligomer_state_file=example_oligomer_state_file.txt \
--monomer_objects_dir=<directory that stores monomer pickle files> \
--data_dir=/path-to-Alphafold-data-dir \
--job_index=<any number you want>
Having screened the oligomeric states of SSB protein, we found our tetramer model agrees with the experimental structure (PDB:4MZ9).
It should be the same directory as output_dir specified in Step 1. It can be one directory or contain multiple directories if you stored pre-calculated objects in different locations. In the case of
multiple monomer_objects_dir, remember to put a , between each e.g. --monomer_objects_dir=<dir_1>,<dir_2>
Default is None and the programme will run predictions one by one in the given files. However, you can set job_index to
different number if you wish to run an array of jobs in parallel then the programme will only run the corresponding job specified by the job_index
❗ job_index starts from 1
On a compute cluster, you may want to run all jobs in parallel as a job array. For example, on SLURM queuing system at EMBL we could use the following run_multimer_jobs_SLURM.sh sbatch script:
#!/bin/bash
#A typical run takes couple of hours but may be much longer
#SBATCH --job-name=array
#SBATCH --time=2-00:00:00
#log files:
#SBATCH -e logs/run_multimer_jobs_%A_%a_err.txt
#SBATCH -o logs/run_multimer_jobs_%A_%a_out.txt
#qos sets priority
#SBATCH --qos=low
#SBATCH -p gpu-el8
#You might want to use a higher-end card in case higher oligomeric state get big:
#SBATCH -C "gpu=A40|gpu=A100"
#Reserve the entire GPU so no-one else slows you down
#SBATCH --gres=gpu:1
#Limit the run to a single node
#SBATCH -N 1
#Adjust this depending on the node
#SBATCH --ntasks=8
#SBATCH --mem=128000
module load Anaconda3
module load CUDA/11.3.1
module load cuDNN/8.2.1.32-CUDA-11.3.1
source activate AlphaPulldown
MAXRAM=$(echo `ulimit -m` '/ 1024.0'|bc)
GPUMEM=`nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits|tail -1`
export XLA_PYTHON_CLIENT_MEM_FRACTION=`echo "scale=3;$MAXRAM / $GPUMEM"|bc`
export TF_FORCE_UNIFIED_MEMORY='1'
run_multimer_jobs.py \
--mode=homo-oligomer \
--output_path=<path to output directory> \
--num_cycle=3 \
--oligomer_state_file=example_oligomer_state_file.txt \
--monomer_objects_dir=<directory that stores monomer pickle files> \
--data_dir=/path-to-Alphafold-data-dir \
--job_index=$SLURM_ARRAY_TASK_ID and then run using:
mkdir -p logs
count=`grep -c "" example_oligomer_state_file.txt` #count lines even if the last one has no end of line
sbatch --array=1-$count run_multimer_jobs_SLURM.sh
Feature 1
When a batch of jobs is finished, AlphaPulldown can create a Jupyter notebook that presents a neat overview of the models, as seen in the example screenshot 
On the left side, there is a bookmark listing all the jobs and when clicking a bookmark, and executing the corresponding cells, the notebook will show: 1) PAE plots 2) predicted model coloured by pLDDT scores 3) predicted models coloured by chains.
In order to create the notebook, within the same conda environment, run:
source activate AlphaPulldown
cd <models_output_dir>
create_notebook.py --cutoff=5.0<output_dir>!
This command will yield an output.ipynb, which you can open it via Jupyterlab. Jupyterlab is already installed when installing AlphaPulldown with pip. Thus, to view the notebook:
source activate AlphaPulldown
cd <models_output_dir>
jupyter-lab output.ipynb📝 If you run AlphaPulldown on a remote computer cluster, you will need a graphical connection to open the notebook in a browser, mount the remote directory to your local computer as a network directory, or copy the entire <models_output_dir> to the local computer.
About the parameters
cutoff is to check the value of PAE between chains. In the case of multimers, the analysis programme will create the notebook only from models with inter-chain PAE values smaller than the cutoff.
Feature 2
We have also provided a singularity image called alpha-analysis.sifto generate a CSV table with structural properties and scores.
Firstly, download the singularity image from here. Chrome user may not be able to download it after clicking the link. If so, please right click and select "Save link as".
Then execute the singularity image (i.e. the sif file) by:
singularity exec \
--no-home \
--bind /path/to/your/output/dir:/mnt \
<path to your downloaded image>/alpha-analysis.sif \
run_get_good_pae.sh \
--output_dir=/mnt \
--cutoff=10
About the outputs
By default, you will have a csv file named predictions_with_good_interpae.csv created in the directory /path/to/your/output/dir as you have given in the command above. predictions_with_good_interpae.csv reports: 1. iptm, iptm+ptm scores provided by AlphaFold 2. mpDockQ score developed by Bryant et al., 2022 3. PI_score developed by Malhotra et al., 2021. The detailed explainations on these scores can be found in our paper and an example screenshot of the table is below. 
As the name suggest, all_vs_all means predict all possible combinations within a single input file. The input can be either full-length proteins or regions of a protein, as illustrated in the example_all_vs_all_list.txt and the figure below:

The corresponding command is:
run_multimer_jobs.py \
--mode=all_vs_all \
--num_cycle=3 \
--num_predictions_per_model=1 \
--output_path=<path to output directory> \
--data_dir=/path-to-Alphafold-data-dir \
--protein_lists=example_all_vs_all_list.txt \
--monomer_objects_dir=/path/to/monomer_objects_directory \
--job_index=<any number you want>