Ensemble Prediction Pipeline to Identify Neoantigens

Overview

HLA molecules bind to peptides found within and outside the cell. Once a peptide bonds, the HLA-peptide complex are presented on the cell surface to provide information to T-cells. These complexes can serve as antigens that T-cells recognize and target. Cancer cells possess unique peptides that can be presented by HLA. These are known as neoantigens. T-cells can selectively target these neoantigens, but cancers often supress the immune system. Nevertheless, there are ways to make the neoantigen elicit an immune response; as such, identifying them is key to cancer immunotherapies.

Identifying them solely through experimental methods is a powerful but demanding approach. Thus, machine learning models have been developed to help predict if a peptide sequence will bind to HLA or elicit an immune response. However, these models can suffer from bias and are modestly accurate. To reduce single model bias and improve accuracy, I present an ensemble prediction pipeline that runs wild type (WT) and mutant (MUT) sequences through two binding affinity predicting models (netMHCpan and Pick Pocket) and two immunogenicity predicting models (PRIME and DeepImmuno) to obtain a final score that signals the mutant sequence's potential to be a neoantigen.

Required File:
sequences.xlsx

Required Columns:

wt_peptide: WT sequences you want to predictions on
mut_peptide: The MUT counterparts for the WT sequences. Note that only missense mutations are supported
id: use numeric values to pair up WT and MUT sequences
allele: If validating the model on experimentally determined neoantigens/non-neoantigens, include the allele the neoantigen/non-neoantigen was tested on. If not validating, leave this column blank

Output

File Name	Description
`sequences.fasta`	A FASTA file containing all of the sequences in sequences.xlsx. This will be used for predictions for NMP, PP, and PRIME
`DeepImmunoInput.csv`	All of the sequences in sequences.xlsx formatted for DeepImmuno
`predictionsProcessedDI.xlsx`	Processed DeepImmuno predicitions
`predictionsProcessedPrime.xlsx`	Processed PRIME predicitions
`predictionsProcessedPP.xlsx`	Processed Pick Pocket predicitions
`predictionsProcessedNMP.xlsx`	Processed netMHCpan predicitions
`combinedScores.xlsx`	The ensemble score for every WT, MUT, and HLA trio
`results.xlsx`	For every WT and MUT pair, the allele that yielded the highest ensemble score is presented. Also contains the ensemble score for the allele most similar to the validated allele if validating the model

How to Run the Pipeline

Links to all models

1. Obtaining Predictions

Begin by creating a sequences.xlsx file with the columns mentioned above. Run this file through sequencesToFasta.R
Take the output (sequences.fasta), and run it through netMHCpan, PRIME and Pick Pocket. For general predictions, I would suggest running the predictions across all of the HLA supertype representatives
Copy and paste the outputs of each of those models into separate txt files
For DeepImmuno, run the sequences.xlsx through DeepImmunoInput.R to get the csv needed for DeepImmuno Predictions. Make sure the alleles in DeepImmunoInput.R match the ones you made predictions with for the other models
Run DeepImmuno.csv through DeepImmuno, and copy and paste the output into a txt file

2. Processing Predictions

Run each of your prediction.txt files through their associated modelNameProcessing.R file (e.g. predictionsNMP.txt goes through NMPProcessing.R)
Currently, you will have to manually manipulate these processed predictions files before you move on by adding some columns
- For netMHCpan and PRIME, we have to normalize the prediction scores for the WT and MUT. I suggest using the following min max normalization formula, =(MAX(column of the rank you're normalizing) - [@WTRank])/(MAX(column of the rank you're normalizing)-MIN(column of the rank you're normalizing)).
- For all models add a Model Score column. The model score formula is the following, =(absolute reference to cell containing the alpha weight)([@NormalMTRank])+(absolute reference to cell containing the beta weight)([@NormalMTRank]-[@NormalWTRank]). Note that you would just use @MTRank or @WTRank for Pick Pocket and DeepImmuno. Additionally, @NormalMTRank should be the name of the column containing your normalized ranks. Lastly, the weights simply let you decide if you want to place more importance on mutants whose predicted scores are high or mutants whose predicted scores are much higher than their WT counterparts.
Run combinedScores.R on the processed predictions files. You only need to run the script once for all four files
- Open the combined scores file and add a Final Score column that contains the average of all four model scores for each WT, MT, and HLA trio
Run the output of combineScores.R through results.R to obtain results.xlsx

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
predictions		predictions
scripts		scripts
README.md		README.md
deepImmunoInput.csv		deepImmunoInput.csv
validatedNeoantigens.xlsx		validatedNeoantigens.xlsx
validatedNeoantigensFasta.fasta		validatedNeoantigensFasta.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble Prediction Pipeline to Identify Neoantigens

Overview

Output

How to Run the Pipeline

Links to all models

1. Obtaining Predictions

2. Processing Predictions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ensemble Prediction Pipeline to Identify Neoantigens

Overview

Output

How to Run the Pipeline

Links to all models

1. Obtaining Predictions

2. Processing Predictions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages