Skip to content

PavriLab/geneIS-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

geneIS-nf

Nextflow

Introduction

geneIS-nf is a bioinformatics analysis pipeline for assigning initiation sites, mapped from NS-seq data (see iniseq-nf and classifyIS-nf) their nearest gene in a reproducible way.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.

Pipeline summary

Quick Start

i. Install nextflow

ii. Install the pandas, numpy, scipy and matplotlib and BioPython Python packages and the argparser, BiocManager, GenomicFeatures and ChIPseeker R packages

iii. Clone repository with

nextflow pull pavrilab/geneIS-nf

iv. Start running your own analysis!

nextflow run pavrilab/geneIS-nf --masterTable IS.master.tsv --txDb annotation.sql --email e@mail.com --xCol WT_col --yCol KD_col

Main arguments

-profile

Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. For example -profile cbe invokes the execution of processes using the slurm workload manager. If no profile is given the pipeline will be executed locally.

--masterTable

Mastertable containing at least quantification results (see classifyIS-nf)

--txDb

SQL file containing a gene annoation generated with the GenomicFeatures R-package

--email

Email address to fetch EntrezIDs with BioPython

--xCol

column in the mastertable holding SNS-seq read quantification results for WT

--yCol

column in the mastertable holding SNS-seq read quantification results for KD

Generic arguments

--foldChange

adds diagonal lines in distance of foldChange to plot

--axMin

Lower bound for axis values of x- and y-axis (Default: 0)

--axMax

Upper bound for axis values of x- and y-axis (Default: 8)

--filePrefix

Prefix for the result files name

--outputDir

Folder to which results will be written (is created if not existing)

Results

The pipeline generates a bunch of result files:

  1. An extended *.master.tsv file with annotated gene features
  2. A *.chipseeker.mapped.tsv file is a tab-separated file containing the annotation results
  3. A bunch of density plots showing the distribution of gene features over the mapped initiation sites

Credits

The pipeline was developed by Daniel Malzl for use at the IMP, Vienna.

Many thanks to others who have helped out along the way too, including (but not limited to): @t-neumann, @pditommaso.

Citations

Pipeline tools

  • Nextflow

    Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

Python Packages

  • BiocManager

  • GenomicFeatures

    Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V (2013). “Software for Computing and Annotating Genomic Ranges.” PLoS Computational Biology, 9. doi: 10.1371/journal.pcbi.1003118

  • ChIPseeker

    Yu G, Wang L, He Q (2015). “ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization.” Bioinformatics, 31(14), 2382-2383. doi: 10.1093/bioinformatics/btv145

  • BioPython

  • pandas

    Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)

  • numpy

    Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011). doi: 10.1109/MCSE.2011.37

  • scipy

    Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020). doi: 10.1038/s41592-019-0686-2

  • matplotlib

    John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007). doi: 10.1109/MCSE.2007.55

About

Nextflow workflow for reproducible annotation initiation sites with genetic features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors