Implementation of AF2ฯ (Cagiada M., Thomasen F.E., et al., bioRxiv 2025) using localColabFold as base code for AF2 (Jumper J., et al., Nature 2021).
AF2ฯ is a tool to predict side-chain heterogeneity using AlphaFold2 and its internal side-chain representations. AF2ฯ outputs side-chain ฯ-angle distributions and structural ensembles around the predicted AF2 structure.
The code in this repository allows you to run AF2ฯ by downloading localColabFold, patching its code, and adding the additional AF2ฯ functionality to the original localColabFold implementation.
AF2ฯ is currently available for the Linux distribution of localColabFold, using a stable forked repository of ColabFold v1.5.5 (commit: fdf3b235b88746681c46ea12bcded76ecf8e1f76 - July 2024) and Alphafold 2.3.7.
To start, clone the repository on your local machine and navigate in the repository directory:
git clone https://github.com/matteo-cagiada/AF2chi_localcolabfold.git
cd AF2chi_localcolabfoldNext, you need to install a localColabFold version compatible with AF2ฯ. We provide an installation script install_colabbatch_linux.sh in the repository, which installs our tested version of localColabFold. The script is a modified version of the original installation script from the localColabFold repository, with adjustments to dependencies to maximise compatibility.
N.B.: LocalColabFold works with CUDA >= 12.0. If you encounter dependency issues, refer to the localColabFold documentation for troubleshooting.
# Use install_colabbatch_linux.sh to install localColabFold
./install_colabbatch_linux.sh**By default ** install_colabbatch_linux.sh installs localColabFold in the directory where the script is executed. If you prefer a different location, move the script to your desired directory before running it.
# Apply the patch to the default installed localColabFold version
./patcher_colabfold_linux.shRun the patcher script and provide the path to the localColabFold folder (localcolabfold) as an argument:
# Apply the patch to localColabFold in a custom location
./patcher_colabfold_linux.sh <path-to-colab-conda>If localColabFold is installed in /users/your_username/home/bin/, the command line would be:
./patcher_colabfold_linux.sh /users/your_username/home/bin/localcolabfold/The patcher will replace the file in the localcolabfold installation and add the AF2ฯ data dependencies and parameters.
AF2ฯ inference is similar to the original localColabfold implementation.
โก๏ธ Activate conda enviroment:
You need first to make the inference command colabfold_batch available: to do this you can either:
- add localColabFold to the enviromental variable list:
# For bash or zsh
# e.g. export PATH="/<path_to_folder>/localcolabfold/colabfold-conda/bin:$PATH"
export PATH="/<path_to_folder>/localcolabfold/colabfold-conda/bin:$PATH"- or activate the localColabFold enviroment directly with conda :
conda activate /<path_to_folder>/localcolabfold/colabfold-condaYou can now run the main localColabFold inference script colabfold_batch
โก๏ธ Running inference
colabfold_batch provides many options. To see all the options available use the help command:
colabfold_batch --helpAF2ฯ options are display in the AF2chi section, here reported
AF2chi:
--af2chi run af2chi to predict sidechain populations and generate a structural ensemble with sidechain predictions (default: False)
--no-reweight run af2chis production on prior library, don't apply re-weighting (default: False)
--no-ensemble do not create ensemble of pdb with sidechain predictions, only save the sidechain chi distributions (default: False)
--no-save-distributions
do not save the sidechain chi distributions (default: False)
--struct-weight STRUCT_WEIGHT
run af2sidechains with specified struct-weight (0.85 is default) (default: 0.85)
--n-struct-ensemble N_STRUCT_ENSEMBLE
number of structures to generate in the af2chi ensemble (default: 100)
The different options allow you to run the AF2ฯ pipeline either partially or fully. You can also adjust several parameters, including the number of output structures in the final ensemble.
We tested AF2ฯ in two different configurations. You can use either with the following commands:
This setup uses full MSA and no structural templates as input to the model. It is recommended when the native structure of your protein is unknown.
You can run AF2ฯ with standard inference by adding the --af2chi flag to your usual localColabFold command:
colabfold_batch --af2chi <input_fasta> <output_folder>
Based on Roney & Ovchinnikov, 2022,the decoy strategy uses the query sequence as input along with a custom structure template and no MSA. We implemented this as the default configuration for AF2ฯ to enable sampling of dihedral distributions and generation of structural ensembles around any input template.
To run AF2ฯ with the decoy strategy, use the --af2chi flag along with these additional options:
--af2chi โ to enable template usage
--custom-template-path โ path to your template structure folder
--msa-mode single_sequence โ disables MSA generation and uses only the query sequence and template
--model-order โ runs the two AF2 models trained with templates
colabfold_batch --af2chi --templates --custom-template-path ../templates/ \
--msa-mode single_sequence --model-order 1,2 <input_fasta> <output_folder>- AF2 accepts only mmCIF (.cif) files as input templates. You can download .cif files directly from the RCSB PDB, or convert your .pdb files using:
-
pdb-extract (official)
-
Folder & Naming: The template file must be placed inside its own folder and named using 4 lowercase letters/numbers, following classic PDB naming conventions.
-
Multiple Templates Support & complexes : There are no restrictions on the number of input templates. AF2 will automatically use any compatible structure that aligns with the query sequence. You can also use complex structures as templatesโsee the ColabFold documentation for further details.
AF2ฯ generates ฯ-angle distributions and then samples from these distributions to generate a structural ensemble.
Along with the localColabFold output, the standard output of AF2ฯ includes:
- The final ฯ-angle distributions for ฯ1 and ฯ2, for the highest ranked model (using AF2 ranking), saved as a dictionary in a JSON file:
{fasta_name}_rank_001_sc_distributions_fitted.json. For each residue, the ฯ-angle population is reported as a discrete probability distribution with 36 bins, ranging from 0 to 360 (10-degree binning).
Example of {fasta_name}_rank_001_sc_distributions_fitted.json
{
"chi1": { ### distribution for ฯ1
"MET1": [ ### target residues
2.2316944008858687e-05, ### prob for the 10-degree bin (from 0 to 10 degrees)
0.00031300707341209516,
0.003013823938645255,
0.018280426832940892,
0.06929143379141825
....
....
0.16224556084048378,
0.2354311056946399,
0.2115453774166867 ### prob last 10-degree bin (from 350 to 360 degrees)
]
....
....
"THR102": [ ## second target residue
0.11728960066637872,
0.04026974898543387,
....
....
0.008523468473728073,
0.0011102230062327329,
9.19117342736029e-0]
}
"chi2": { ### distribution for ฯ2
"MET1":[....]
.....
}
}
- The structural ensemble, saved as PDB files in the subfolder
sidechain_ensemble. The standard ensemble size is 100 structures.
Additional AF2ฯ options available during inference may modify or remove some of these outputs, in particular:
-
--no-ensembleand--no-save-distributionswill remove both the final ensemble and the JSON file from the output. -
--no-reweightwill generate ฯ-angle populations using the prior distribution, returning these in the dictionary:{fasta_name}_rank_001_sc_distributions_prior.json, and generate the structural ensemble using the prior distributions.
- Make sure that Docker is installed, please follow your operating system instructions
- Run docker build
docker build -t af2chi_localcolabfold:latest .This command runs AF2ฯ on an input file input.fasta or directory $INPUT_DIR and stores the results in $OUTPUT_DIR.
Note that Docker requires that volumes are specified as absolute paths.
The AlphaFold2 parameters should be downloaded and mounted into /cache in the container, in this example command the directory /path/to/colabfold-cache/cache is used.
For details, please refer to https://github.com/sokrypton/ColabFold/wiki/Running-ColabFold-in-Docker .
docker run --rm
--runtime=nvidia --gpus 1
--env PYTHONUNBUFFERED=TRUE
-v /path/to/colabfold-cache/cache:/cache
-v "${INPUT_DIR}":/input:ro
-v "${OUTPUT_DIR}":/output
af2chi_localcolabfold:latest
colabfold_batch
--af2chi
/input/input.fasta
/output
Apptainer image can be built from an existing Docker image.
apptainer build af2chi_localcolabfold_latest.sif docker-daemon://af2chi_localcolabfold:latest
This command runs AF2ฯ on an input file input.fasta or directory $INPUT_DIR and stores the results in $OUTPUT_DIR.
Note that Docker requires that volumes are specified as absolute paths.
apptainer run \
--nv \
--env PYTHONUNBUFFERED=TRUE \
-B /path/to/colabfold-cache/cache:/cache \
-B "${INPUT_DIR}":/input:ro \
-B "${OUTPUT_DIR}":/output \
af2chi_localcolabfold_latest.sif \
colabfold_batch \
--af2chi \
/input/input.fasta \
/output
โ
Fix: Ensure that the colabfold-conda library path is included in your LD_LIBRARY_PATH environment variable. To check, print its current value:
echo $LD_LIBRARY_PATHIf the path is missing, prepend the library location with:
export LD_LIBRARY_PATH=/<path_to_your_installation>/localcolabfold/colabfold-conda/lib/If issues persist, you may need to install the correct version of GCC (the missing library is usually specified in the error message). For more information, refer to the GCC installation guide.
โ Fix: By default, AF2ฯ (via localColabFold) attempts to utilize all available GPUs, which can cause issues on certain systems. To ensure AF2ฯ runs on a specific GPU, use the following commands before execution:
export CUDA_DEVICE_ORDER="PCI_BUS_ID"
export CUDA_VISIBLE_DEVICES=N # Replace N with the GPU index (e.g., 0, 1, etc.)- Refined complex prediction (formatted output)
If you use our model please cite:
Cagiada, M., Thomasen, F.E., Ovchinnikov S., Deane C.M & Lindorff-Larsen, K. (2025). AF2ฯ: Predicting protein side-chain rotamer distributions with AlphaFold2. In bioRxiv (p. 2024.05.21.595203). https://doi.org/10.1101/2024.05.21.595203
@ARTICLE{Cagiada2025-ax,
title = "AF2ฯ: Predicting protein side-chain rotamer distributions with AlphaFold2",
author = "Cagiada, Matteo and Thomasen, F. Emil and Ovchinnikov, Sergey and Deane, Charlotte M. and Lindorff-Larsen, Kresten",
journal = "bioRxiv",
pages = "",
month = ,
year = ,
language = "en"
Also if you use this localColab implementation remember to cite:
- Mirdita M, Schรผtze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold - Making protein folding accessible to all. Nature Methods (2022) doi: 10.1038/s41592-022-01488-1
- If youโre using AlphaFold, please also cite: Jumper et al. "Highly accurate protein structure prediction with AlphaFold." Nature (2021) doi: 10.1038/s41586-021-03819-2
- If youโre using AlphaFold-multimer, please also cite: Evans et al. "Protein complex prediction with AlphaFold-Multimer." BioRxiv (2022) doi: 10.1101/2021.10.04.463034v2
The research was supported by the PRISM (Protein Interactions and Stability in Medicine and Genomics) centre funded by the Novo Nordisk Foundation (NNF18OC0033950, to K.L.-L.), a Novo Nordisk Foundation Postdoctoral Fellowship (NNF23OC0082912; to MC). We acknowledge access to computational resources via a grant from the Carlsberg Foundation (CF21-0392; to K.L.-L.).
This project is licensed under the MIT License. See LICENSE for details.
For questions or support with this repository, please use the GitHub issue tab or reach out to us via email:
๐ง Matteo Cagiada: matteo.cagiada@bio.ku.dk
๐ง Emil Thomasen: fe.thomasen@bio.ku.dk
๐ง Kresten Lindorff-Larsen: lindorff@bio.ku.dk