geneIS-nf

Introduction

geneIS-nf is a bioinformatics analysis pipeline for assigning initiation sites, mapped from NS-seq data (see iniseq-nf and classifyIS-nf) their nearest gene in a reproducible way.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.

Pipeline summary

Quick Start

i. Install nextflow

ii. Install the pandas, numpy, scipy and matplotlib and BioPython Python packages and the argparser, BiocManager, GenomicFeatures and ChIPseeker R packages

iii. Clone repository with

nextflow pull pavrilab/geneIS-nf

iv. Start running your own analysis!

nextflow run pavrilab/geneIS-nf --masterTable IS.master.tsv --txDb annotation.sql --email e@mail.com --xCol WT_col --yCol KD_col

Main arguments

`-profile`

Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. For example -profile cbe invokes the execution of processes using the slurm workload manager. If no profile is given the pipeline will be executed locally.

`--masterTable`

Mastertable containing at least quantification results (see classifyIS-nf)

`--txDb`

SQL file containing a gene annoation generated with the GenomicFeatures R-package

`--email`

Email address to fetch EntrezIDs with BioPython

`--xCol`

column in the mastertable holding SNS-seq read quantification results for WT

`--yCol`

column in the mastertable holding SNS-seq read quantification results for KD

Generic arguments

`--foldChange`

adds diagonal lines in distance of foldChange to plot

`--axMin`

Lower bound for axis values of x- and y-axis (Default: 0)

`--axMax`

Upper bound for axis values of x- and y-axis (Default: 8)

`--filePrefix`

Prefix for the result files name

`--outputDir`

Folder to which results will be written (is created if not existing)

Results

The pipeline generates a bunch of result files:

An extended *.master.tsv file with annotated gene features
A *.chipseeker.mapped.tsv file is a tab-separated file containing the annotation results
A bunch of density plots showing the distribution of gene features over the mapped initiation sites

Credits

The pipeline was developed by Daniel Malzl for use at the IMP, Vienna.

Many thanks to others who have helped out along the way too, including (but not limited to): @t-neumann, @pditommaso.

Citations

Pipeline tools

Nextflow

Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

Python Packages

BiocManager
GenomicFeatures

Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V (2013). “Software for Computing and Annotating Genomic Ranges.” PLoS Computational Biology, 9. doi: 10.1371/journal.pcbi.1003118
ChIPseeker

Yu G, Wang L, He Q (2015). “ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization.” Bioinformatics, 31(14), 2382-2383. doi: 10.1093/bioinformatics/btv145
BioPython
pandas

Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)
numpy

Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011). doi: 10.1109/MCSE.2011.37
scipy

Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020). doi: 10.1038/s41592-019-0686-2
matplotlib

John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007). doi: 10.1109/MCSE.2007.55

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bin		bin
conf		conf
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

geneIS-nf

Introduction

Pipeline summary

Quick Start

Main arguments

`-profile`

`--masterTable`

`--txDb`

`--email`

`--xCol`

`--yCol`

Generic arguments

`--foldChange`

`--axMin`

`--axMax`

`--filePrefix`

`--outputDir`

Results

Credits

Citations

Pipeline tools

Python Packages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

geneIS-nf

Introduction

Pipeline summary

Quick Start

Main arguments

-profile

--masterTable

--txDb

--email

--xCol

--yCol

Generic arguments

--foldChange

--axMin

--axMax

--filePrefix

--outputDir

Results

Credits

Citations

Pipeline tools

Python Packages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`-profile`

`--masterTable`

`--txDb`

`--email`

`--xCol`

`--yCol`

`--foldChange`

`--axMin`

`--axMax`

`--filePrefix`

`--outputDir`

Packages