t2pmhc: A Structure-Informed Graph Neural Network for Predicting TCR–pMHC Binding
Pull image from DockerHub:
docker pull mvp9/t2pmhc:1.0.2
- Clone the repository
git clone https://github.com/qbic-pipelines/t2pmhc/
-
cd into the repository
-
Create a fresh conda env
conda create -n t2pmhc python=3.11
- Install the requirements.txt
pip install -r requirements.txt
- Install t2pmhc locally
pip install -e .
Now you can use t2pmhc anywhere on your machine.
t2pmhc currently supports pdb files created with TCRdock.
To predict TCR-pMHC structures with TCRdock you can use our branch of the nf-core/proteinfold pipeline
Clone the repository and checkout to the tcrdock branch
git clone https://github.com/mapo9/nf-core_proteinfoldgit checkout tcrdock
See the documentation to create the docker container and run the pipeline.
Minimal samplesheet:
organism,mhc_class,mhc,peptide,va,ja,cdr3a,vb,jb,cdr3b,identifier
human,1,A*02:01:48,RLQSLQTYV,TRAV16*01,TRAJ39*01,CALSGFNNAGNMLTF,TRBV11-2*01,TRBJ2-3*01,CASSLGGAGGADTQYF,a2341ad
human,1,A*02:01:48,YLQPRTFLL,TRAV12-2*01,TRAJ30*01,CAVNRDDKIIF,TRBV7-9*01,TRBJ2-7*01,CASSPDIEQYF,223dse2| Column | Description |
|---|---|
organism |
'human'. |
mhc_class |
1 |
mhc |
The MHC allele, e.g. 'A*02:01' |
peptide |
The peptide sequence. |
va |
V-alpha gene. |
ja |
J-alpha gene. |
cdr3a |
CDR3-alpha sequence, starts with C, ends with the F/W/etc right before the GXG sequence in the J gene. |
vb |
V-beta gene. |
jb |
J-beta gene. |
cdr3b |
CDR3-beta sequence, starts with C, ends with the F/W/etc right before the GXG sequence in the J gene. |
identifier |
Unique sample identifier. |
t2pmhc expects TCRdock output as input for the graph generation step. Minimal samplesheet:
organism mhc_class mhc peptide va ja cdr3a vb jb cdr3b identifier model_2_ptm_pae pmhc_tcr_pae target_chainseq pdb_file_path
human 1 A*02:01 RLQSLQTYV TRAV16*01 TRAJ39*01 CALSGFNNAGNMLTF TRBV11-2*01 TRBJ2-3*01 CASSLGGAGGADTQYF 1sr34 2.43 6.24 CALSGFNNAGNMLTF/RLQSLQTYV/CASSLGGAGGADTQYF path/to/tcrdock/pdb
human 1 A*02:01 YLQPRTFLL TRAV12-2*01 TRAJ30*01 CAVNRDDKIIF TRBV7-9*01 TRBJ2-7*01 CASSPDIEQYF 223dse2 4.5 7.2 YLQPRTFLL/CAVNRDDKIIF/CASSPDIEQYF path/to/tcrdock/pdb| Column | Description |
|---|---|
organism |
'human'. |
mhc_class |
1 |
mhc |
The MHC allele, e.g. 'A*02:01' |
peptide |
The peptide sequence. |
va |
V-alpha gene. |
ja |
J-alpha gene. |
cdr3a |
CDR3-alpha sequence, starts with C, ends with the F/W/etc right before the GXG sequence in the J gene. |
vb |
V-beta gene. |
jb |
J-beta gene. |
cdr3b |
CDR3-beta sequence, starts with C, ends with the F/W/etc right before the GXG sequence in the J gene. |
identifier |
Unique sample identifier. |
model_2_ptm_pae |
PAE of the complex (provided by TCRdock). |
pmhc_tcr_pae |
TCR-pMHC specific PAE value (provided by TCRdock). |
target_chainseq |
Full sequence of the complex (MHC/peptide/TCRA/TCRB) (provided by TCRdock). |
pdb_file_path |
Path to the PDB file created by TCRdock. (must have _<LABEL>.pdb suffix if used for training (LABEL=0/1)) |
The TCRDock pipeline produces
npyfiles containing the PAEs, named after their respective PDB files with the suffix_predicted_aligned_error.npy. These files must reside in the same directory as the PDB files. If training mode is activated, the label must be present before this suffix.
If the graphs are created for training (--training-mode), the PDB files must have the binder status (LABEL) as suffix (e.g. sample01_0.pdb), same for respective PAE files (e.g. sample01_0_predicted_aligned_error.npy)
To create the graphs expected by the models from the pdb files, you can run the following command:
t2pmhc create-t2pmhc-graphs \
--mode <t2pmhc-gcn,t2pmhc-gat> \
--samplesheet samplesheet.tsv \
--training-mode / --prediction-mode \
--out <path/to/graphs.pt> \
t2pmhc train-t2pmhc-gcn \
--run_name <name to save model under> \
--hyperparameters path/to/t2pmhc/t2pmhc/data/hyperparams/t2pmhc_gcn.json \
--samplesheet samplesheet.tsv \
--saved_graphs <path/to/graphs.pt> \
--save_model <path/to/model_dir>
t2pmhc train-t2pmhc-gat \
--run_name <name to save model under> \
--hyperparameters path/to/t2pmhc/t2pmhc/data/hyperparams/t2pmhc_gat.json \
--samplesheet samplesheet.tsv \
--saved_graphs <path/to/graphs.pt> \
--save_model <path/to/model_dir>
You can either use a model you trained or use the published default models to predict the binder status for your TCR-pMHC complexes.
The resulting tsv file will contain the column binder_prob containing the binding probability of the complex assigned by t2pmhc.
t2pmhc t2pmhc-predict-binding \
--mode <t2pmhc-gcn, t2pmhc-gat> \
--samplesheet samplesheet.tsv \
--saved_graphs <path/to/graphs.pt> \
--out samplesheet_predicted.tsv
t2pmhc t2pmhc-predict-binding \
--mode <t2pmhc-gcn, t2pmhc-gat> \
--samplesheet samplesheet.tsv \
--saved_graphs <path/to/graphs.pt> \
--out samplesheet_predicted.tsv \
--model <model.pt> \
--pae_scaler_structure <pae_node_FULL.pkl> \
--pae_scaler_tcrpmhc <pae_node_TCRPMHC.pkl> \
--hydro_scaler <hydro_scaler.pkl> \
--distance_scaler <distance_scaler.pkl> \
--pae_scaler_edge <pae_edge_FULL.pkl> \
If you use t2pmhc, please cite the article as follows:
t2pmhc: A Structure-Informed Graph Neural Network to Predict TCR-pMHC Binding
Mark Polster, Josua Stadelmaier, Elias Ball, Jonas Scheid, Jens Bauer, Annika Nelde, Manfred Claassen, Marissa Dubbelaar, Juliane S. Walz, Sven Nahnsen. bioRxiv (2026): 2026-02. doi: https://doi.org/10.64898/2026.02.27.708137.