A comprehensive bioinformatics pipeline for detecting neoantigens from paired tumor-normal samples using whole exome sequencing (WES) and RNA-seq data.
This pipeline processes raw NGS data to identify potential neoantigens in cancer samples through the following steps:
- Quality control and preprocessing of raw fastq files
- Alignment to reference genome
- HLA typing
- Somatic variant calling
- Variant annotation
- Neoantigen prediction using multiple prediction algorithms
- Installation Guide - How to set up the pipeline and its dependencies
- Usage Guide - How to run the pipeline with your data
- Troubleshooting Guide - Solutions for common issues
# Clone the repository
git clone https://github.com/iichelhadi/Neoantigen_detection_pipeline.git
cd Neoantigen_detection_pipeline
# Set up conda environments
conda env create -f envs/alignment.yaml
conda env create -f envs/variant_calling.yaml
conda env create -f envs/vep.yaml
conda env create -f envs/pvactools.yaml
# Configure the pipeline
cp config/config.sh.template config/config.sh
# Edit config.sh with your specific paths and settings
# Run the pipeline
bash scripts/run_pipeline.sh -c config/config.sh -s data/sample_info.tsvThe pipeline requires the following software (installed via conda environments):
- TrimGalore and FastQC (quality control)
- BWA and HISAT2 (alignment)
- GATK and Samtools (variant calling)
- OptiType (HLA typing)
- VEP (Variant Effect Predictor)
- pVACtools (neoantigen prediction)
See the Installation Guide for detailed requirements and setup instructions.
neoantigen-pipeline/
├── config/
│ └── config.sh.template
├── data/
│ └── sample_info.tsv.template
├── scripts/
│ ├── 01_preprocess.sh
│ ├── 02_dna_alignment.sh
│ ├── 03_rna_alignment.sh
│ ├── 04_hla_typing.sh
│ ├── 05_variant_calling.sh
│ ├── 06_variant_annotation.sh
│ ├── 07_neoantigen_prediction.sh
│ ├── run_pipeline.sh
│ ├── submit_pipeline.sh
│ └── utils/
│ └── check_dependencies.sh
├── envs/
│ ├── alignment.yaml
│ ├── variant_calling.yaml
│ ├── vep.yaml
│ └── pvactools.yaml
├── docs/
│ ├── installation.md
│ ├── usage.md
│ └── troubleshooting.md
└── README.md
The pipeline generates the following output directories:
trim_fq/: Quality-trimmed fastq filesBAM_files/: Aligned BAM filesoptitype/: HLA typing resultsvcf_files/: Variant calls in VCF formatvcf_files_annotated/: Annotated variantspvacseq/: Neoantigen predictions
For detailed information about running the pipeline and interpreting results, see the Usage Guide.
If you use this pipeline in your research, please cite the following tools:
- TrimGalore
- BWA-MEM: Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
- GATK: McKenna A, et al. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
- OptiType: Szolek A, et al. (2014) OptiType: precision HLA typing from next-generation sequencing data
- VEP: McLaren W, et al. (2016) The Ensembl Variant Effect Predictor
- pVACtools: Hundal J, et al. (2020) pVACtools: a computational toolkit to identify and visualize cancer neoantigens
If you use this pipeline in your research, please cite:
Elhadi Iich. (2022). Neoantigen Detection Pipeline [Computer software]. https://github.com/iichelhadi/Neoantigen_detection_pipeline
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.