Skip to content

sanjaysgk/ipg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

352 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sanjaysgk/ipg

GitHub Actions CI Status nf-test Nextflow pixi run with singularity License: MIT

Introduction

sanjaysgk/ipg is a bioinformatics pipeline for immunopeptidogenomics: it builds a personalised cryptic peptide search database from RNA-seq, then searches it against immunopeptidomics MS/MS data to identify non-canonical (cryptic) peptides. It implements the method of Scull et al. (2021) as a reproducible nf-core-style Nextflow pipeline.

ipg pipeline overview

The pipeline runs in independent steps selected with --step:

--step db_construct — RNA-seq → cryptic peptide FASTA

  1. Align reads with two-pass STAR and infer strandedness with RSeQC.
  2. Assemble transcripts with StringTie and reconcile with the reference annotation via gffcompare.
  3. GATK4 RNA-seq best-practice BAM preparation (MarkDuplicates → SplitNCigarReads → two-pass BQSR).
  4. Call somatic variants with Mutect2 in tumour-only mode.
  5. Build the cryptic peptide database with the IPG custom C tools (curate_vcf, alt_liftover, triple_translate, squish).

--step ms_search — MS/MS → identified cryptic peptides

  1. Search each sample's spectra against its cryptic database with MSFragger, Comet and Sage.
  2. Rescore PSMs with MS2Rescore + mokapot FDR and integrate engines at a configurable peptide-level FDR (default 1%).
  3. Optional de novo discovery lane (--run_denovo, InstaNovo) — predicts peptides directly from spectra and classifies them canonical / cryptic / novel.
  4. Optional immunoinformatics (HLA binding, motif clustering, quantification) and a cryptic-discovery report.

Usage

Note

New to Nextflow? See the nf-core installation docs. The repository ships a pixi environment that pins every tool — install it with pixi install (curl -fsSL https://pixi.sh/install.sh | bash if you don't have pixi).

Prepare a samplesheet:

samplesheet.csv

sample,fastq_1,fastq_2,strandedness
SAMPLE,/path/to/R1.fastq.gz,/path/to/R2.fastq.gz,reverse

Build the cryptic peptide database:

pixi run nextflow run . \
    -profile singularity \
    --step db_construct \
    --input samplesheet.csv \
    --outdir results \
    -params-file reference.yaml

Warning

Provide parameters via the CLI or a -params-file, not via a custom -c config file.

To try the pipeline on the bundled chr22 test data, run with -profile test,pixi. For the full reference-genome parameters, the MS-search samplesheet, the --step ms_search and --step post_ms workflows, and all options, see docs/usage.md.

Pipeline output

  • Database construction: results/db_construct/<sample>/<sample>_cryptic.fasta
  • MS search: the integrated peptide table under results/ms_search/<sample>/
  • A MultiQC report and Nextflow execution reports under results/pipeline_info/

See docs/output.md for the full output description.

Profiles

Profile Purpose
pixi Run every tool from the local pixi env (no containers)
singularity / docker Pull biocontainers (HPC / cloud)
monash SLURM on the Monash M3 comp partition (xy86 account)
test Use the bundled chr22 test data

Credits

sanjaysgk/ipg was written by Sanjay SG Krishna (@sanjaysgk), Li Lab, Monash University, porting the immunopeptidogenomics method and custom C tools developed by Kate Scull (Purcell Lab; kescull/immunopeptidogenomics). Supervised by Chen Li (Li Lab) and Anthony W. Purcell (Purcell Lab), Monash University.

Contributions and support

Contributions and bug reports are welcome — please open a GitHub issue or a pull request.

Citations

If you use sanjaysgk/ipg, please cite the method paper:

Scull KE, Pandey K, Ramarathinam SH, Purcell AW. Immunopeptidogenomics: harnessing RNA-seq to illuminate the dark immunopeptidome. Mol Cell Proteomics. 2021;20:100143. doi:10.1016/j.mcpro.2021.100143

A reference list for every tool in the pipeline is in CITATIONS.md. This pipeline is built with Nextflow and the nf-core framework (Ewels et al., Nat Biotechnol. 2020, doi:10.1038/s41587-020-0439-x).

License

MIT — see LICENSE.