AntiFold predicts sequences which fit into antibody variable domain structures. The tool outputs residue log-likelihoods in CSV format, and can sample sequences to a FASTA format directly. Sampled sequences show high structural agreement with experimental structures.
AntiFold is based on the ESM-IF1 model and is fine-tuned on solved and predicted antibody structures from SAbDab and OAS.
- Paper: Bioinformatics Advances
- Webserver: OPIG webserver
- Colab:
- Model: model.pt
- License: BSD 3-Clause
To try AntiFold without installing it, please see our OPIG webserver: https://opig.stats.ox.ac.uk/webapps/antifold/
- Antibody (+ antigen) probabilities and sequence sampling
- Nanobody (+ antigen) probabilities and sequence sampling
- Sampling of residues from specified IMGT regions. Nb: assumes antibody is IMGT numbered! (See --num_seq_per_target and --regions)
- Supports use of AntiFold fine-tuned weights and ESM-IF1 pre-trained weights (See --esm_if1_mode)
- Extraction of per-residue inverse-folding embeddings (See --extract_embeddings)
- GPU and MacBook GPU (MPS) accelerated predictions
- Input should be either a paired variable domain structure (VH/VL) antibody or nanobody (VHH) (--nanobody_mode)
- AntiFold assumes the first PDB chain is the heavy chain, and second the light chain, unless manually specified by the user (See --pdbs_csv, --heavy_chain, --light_chain options)
- Antigen chains can optionally be specified. We recommend only including a single, ideally small, antigen chain. (See --pdbs_csv or --antigen_chain options)
- Sequence sampling assumes PDBs have been IMGT numbered. You can find IMGT numbered PDBs on SAbDab, or re-number PDBs with ANARCI
conda create --name antifold python=3.10 -y && conda activate antifold
conda install -c conda-forge pytorch==2.2.0
git clone https://github.com/oxpig/AntiFold && cd AntiFold
pip install .GPU only: install using environment.yml
conda env create -f environment.yml
python -m pip install .Depending on your CUDA version you may need to change the dependency pytorch-cuda=12.1 in the environment.yml file.
Detailed instructions on how to correctly install pytorch for your system can be found here
# Run AntiFold on single PDB/CIF file
# Nb: Assumes first chain heavy, second chain light
python antifold/main.py \
--pdb_file data/pdbs/6y1l_imgt.pdb
# Antibody-antigen complex
python antifold/main.py \
--pdb_file data/antibody_antigen/3hfm.pdb \
--heavy_chain H \
--light_chain L \
--antigen_chain Y
# Nanobody or single-chain
python antifold/main.py \
--pdb_file data/nanobody/8oi2_imgt.pdb \
--nanobody_chain B
# Folder of PDB/CIFs
# Nb: Assumes first chain heavy, second light
python antifold/main.py \
--pdb_dir data/pdbs
# Specify chains to run in a CSV file (e.g. antibody-antigen complex)
python antifold/main.py \
--pdb_dir data/antibody_antigen \
--pdbs_csv data/antibody_antigen.csv
# Sample sequences 10x (paired VH/VL only)
# Note: Requires IMGT numbered PDBs (e.g. from SAbDab or numbered with ANARCI)
python antifold/main.py \
--pdb_file data/pdbs/6y1l_imgt.pdb \
--heavy_chain H \
--light_chain L \
--num_seq_per_target 10 \
--sampling_temp "0.2" \
--regions "CDR1 CDR2 CDR3"
# Run all chains with ESM-IF1 model weights
python antifold/main.py \
--pdb_dir data/pdbs \
--esm_if1_modeNotebook: notebook.ipynb
import antifold
import antifold.main
# Load model
model = antifold.main.load_model()
# PDB directory
pdb_dir = "data/pdbs"
# Assumes first chain heavy, second chain light
pdbs_csv = antifold.main.generate_pdbs_csv(pdb_dir, max_chains=2)
# Sample from PDBs
df_logits_list = antifold.main.get_pdbs_logits(
model=model,
pdbs_csv_or_dataframe=pdbs_csv,
pdb_dir=pdb_dir,
)
# Output log probabilites
df_logits_list[0]Required parameters:
Input PDBs should be antibody variable domain structures (IMGT positions 1-128).
If no chains are specified, the first two chains will be assumed to be heavy light.
If custom_chain_mode is set, all (10) chains will be run.
- Option 1: PDB file (--pdb_file). We recommend specifying heavy and light chain (--heavy_chain and --light_chain)
- Option 2: PDB folder (--pdb_dir) + CSV file specifying chains (--pdbs_csv)
- Option 3: PDB folder, infer 1st chain heavy, 2nd chain light
Parameters for generating new sequences:
PDBs should be IMGT annotated for the sequence sampling regions to be valid.
- Number of sequences to generate (--num_seq_per_target)
- Region to mutate (--region) based on inverse folding probabilities. Select from list in IMGT_dict (e.g. 'CDRH1 CDRH2 CDRH3')
- Sampling temperature (--sampling_temp) controls generated sequence diversity, by scaling the inverse folding probabilities before sampling. Temperature = 1 means no change, while temperature ~ 0 only samples the most likely amino-acid at each position (acts as argmax).
Optional parameters:
- Multi-chain mode for including antigen or other chains (--custom_chain_mode)
- Extract latent representations of PDB within model (--extract_embeddings)
- Use ESM-IF1 instead of AntiFold model weights (--esm_if1_mode), enables custom_chain_mode
For example webserver output, see: https://opig.stats.ox.ac.uk/webapps/antifold/results/example_job/
Output CSV with residue log-probabilities: Residue probabilities: 6y1l_imgt.csv
- pdb_pos - PDB residue number
- pdb_chain - PDB chain
- aa_orig - PDB residue (e.g. 112)
- aa_pred - Top predicted residue by AntiFold (i.e. argmax) for this position
- pdb_posins - PDB residue number with insertion code (e.g. 112A)
- perplexity - Inverse folding tolerance (higher is more tolerant) to mutations. See paper for more details.
- Amino-acids - Inverse folding scores (log-likelihood) for the given position
pdb_pos,pdb_chain,aa_orig,aa_pred,pdb_posins,perplexity,A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
2,H,V,M,2,1.6488,-4.9963,-6.6117,-6.3181,-6.3243,-6.7570,-4.2518,-6.7514,-5.2540,-6.8067,-5.8619,-0.0904,-6.5493,-4.8639,-6.6316,-6.3084,-5.1900,-5.0988,-3.7295,-8.0480,-7.3236
3,H,Q,Q,3,1.3889,-10.5258,-12.8463,-8.4800,-4.7630,-12.9094,-11.0924,-5.6136,-10.9870,-3.1119,-8.1113,-9.4382,-6.2246,-13.3660,-0.0701,-4.9957,-10.0301,-6.8618,-7.5810,-13.6721,-11.4157
4,H,L,L,4,1.0021,-13.3581,-12.6206,-17.5484,-12.4801,-9.8792,-13.6382,-14.8609,-13.9344,-16.4080,-0.0002,-9.2727,-16.6532,-14.0476,-12.5943,-15.4559,-16.9103,-17.0809,-10.5670,-13.5334,-13.4324
...
Output FASTA file with sampled sequences: 6y1l_imgt.fasta
- T: Temperature used for design
- score: average log-odds of residues in the sampled region
- global_score: average log-odds of all residues (IMGT positions 1-128)
- regions: regions selected for design
- seq_recovery: # mutations / total sequence length
- mutations: # mutations from original PDB sequence
>6y1l_imgt , score=0.2934, global_score=0.2934, regions=['CDR1', 'CDR2', 'CDRH3'], model_name=AntiFold, seed=42
VQLQESGPGLVKPSETLSLTCAVSGYSISSGYYWGWIRQPPGKGLEWIGSIYHSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLTQSSHNDANWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
> T=0.20, sample=1, score=0.3930, global_score=0.1869, seq_recovery=0.8983, mutations=12
VQLQESGPGLVKPSETLSLTCAVSGASITSSYYWGWIRQPPGKGLEWIGSIYYSGSTYYN
PSLKSRVTISVDTSKNQFSLKLSSVTAADTAVYYCAGLYGSPWSNPYWGQGTLVTVSS/V
LTQPPSVSAAPGQKVTISCSGSSSNIGNNYVSWYQQLPGTAPKRLIYDNNKRPSGIPDRF
SGSKSGTSATLGITGLQTGDEADYYCGTWDSSLNPVFGGGTKLEIKR
...
usage:
# Predict on example PDBs in folder
python antifold/main.py \
--pdb_file data/antibody_antigen/3hfm.pdb \
--heavy_chain H \
--light_chain L \
--antigen_chain Y # Optional
Predict inverse folding probabilities for antibody variable domain, and sample sequences with maintained fold.
PDB structures should be IMGT-numbered, paired heavy and light chain variable domains (positions 1-128).
For IMGT numbering PDBs use SAbDab or https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarci/
options:
-h, --help show this help message and exit
--pdb_file PDB_FILE Input PDB file (for single PDB predictions)
--heavy_chain HEAVY_CHAIN
Ab heavy chain (for single PDB predictions)
--light_chain LIGHT_CHAIN
Ab light chain (for single PDB predictions)
--antigen_chain ANTIGEN_CHAIN
Antigen chain (optional)
--pdbs_csv PDBS_CSV Input CSV file with PDB names and H/L chains (multi-PDB predictions)
--pdb_dir PDB_DIR Directory with input PDB files (multi-PDB predictions)
--out_dir OUT_DIR Output directory
--regions REGIONS Space-separated regions to mutate. Default 'CDR1 CDR2 CDR3H'
--num_seq_per_target NUM_SEQ_PER_TARGET
Number of sequences to sample from each antibody PDB (default 0)
--sampling_temp SAMPLING_TEMP
A string of temperatures e.g. '0.20 0.25 0.50' (default 0.20). Sampling temperature for amino acids. Suggested values 0.10, 0.15, 0.20, 0.25, 0.30. Higher values will lead to more diversity.
--limit_variation Limit variation to as many mutations as expected from temperature sampling
--extract_embeddings Extract per-residue embeddings from AntiFold / ESM-IF1
--custom_chain_mode Run all specified chains (for antibody-antigen complexes or any combination of chains)
--exclude_heavy Exclude heavy chain from sampling
--exclude_light Exclude light chain from sampling
--batch_size BATCH_SIZE
Batch-size to use
--num_threads NUM_THREADS
Number of CPU threads to use for parallel processing (0 = all available)
--seed SEED Seed for reproducibility
--model_path MODEL_PATH
Alternative model weights (default models/model.pt). See --use_esm_if1_weights flag to use ESM-IF1 weights instead of AntiFold
--esm_if1_mode Use ESM-IF1 weights instead of AntiFold
--verbose VERBOSE Verbose printingUsed to specify which regions to mutate in an IMGT numbered PDB
- IMGT numbered PDBs: https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
- Renumber existing PDBs with ANARCI: https://github.com/oxpig/ANARCI
- Read more: https://www.imgt.org/IMGTScientificChart/Numbering/IMGTIGVLsuperfamily.html
IMGT_dict = {
"all": range(1, 128 + 1),
"allH": range(1, 128 + 1),
"allL": range(1, 128 + 1),
"FWH": list(range(1, 26 + 1)) + list(range(39, 55 + 1)) + list(range(66, 104 + 1)) + list(range(118, 128 + 1)),
"FWL": list(range(1, 26 + 1)) + list(range(39, 55 + 1)) + list(range(66, 104 + 1)) + list(range(118, 128 + 1)),
"CDRH": list(range(27, 38 + 1)) + list(range(56, 65 + 1)) + list(range(105, 117 + 1)),
"CDRL": list(range(27, 38 + 1)) + list(range(56, 65 + 1)) + list(range(105, 117 + 1)),
"FW1": range(1, 26 + 1),
"FWH1": range(1, 26 + 1),
"FWL1": range(1, 26 + 1),
"CDR1": range(27, 38 + 1),
"CDRH1": range(27, 38 + 1),
"CDRL1": range(27, 38 + 1),
"FW2": range(39, 55 + 1),
"FWH2": range(39, 55 + 1),
"FWL2": range(39, 55 + 1),
"CDR2": range(56, 65 + 1),
"CDRH2": range(56, 65 + 1),
"CDRL2": range(56, 65 + 1),
"FW3": range(66, 104 + 1),
"FWH3": range(66, 104 + 1),
"FWL3": range(66, 104 + 1),
"CDR3": range(105, 117 + 1),
"CDRH3": range(105, 117 + 1),
"CDRL3": range(105, 117 + 1),
"FW4": range(118, 128 + 1),
"FWH4": range(118, 128 + 1),
"FWL4": range(118, 128 + 1),
}- Li, Y., et al. Benchmarking Inverse Folding Models for Antibody CDR Sequence Design, PLOS ONE, 2025
- Clifford J.N. et al. (2025). AbEpiTope-1.0: Improved antibody target prediction by use of AlphaFold and inverse folding.
The code and data in this package is based on the following paper AntiFold. If you use it, please cite:
@misc{antifold,
title={AntiFold: Improved antibody structure-based design using inverse folding},
author={Magnus Haraldson Høie and Alissa Hummer and Tobias H. Olsen and Broncio Aguilar-Sanjuan and Morten Nielsen and Charlotte M. Deane},
year={2024},
eprint={2405.03370},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}