geneIS-nf is a bioinformatics analysis pipeline for assigning initiation sites, mapped from NS-seq data (see iniseq-nf and classifyIS-nf) their nearest gene in a reproducible way.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
i. Install nextflow
ii. Install the pandas, numpy, scipy and matplotlib and BioPython Python packages and the argparser, BiocManager, GenomicFeatures and ChIPseeker R packages
iii. Clone repository with
nextflow pull pavrilab/geneIS-nfiv. Start running your own analysis!
nextflow run pavrilab/geneIS-nf --masterTable IS.master.tsv --txDb annotation.sql --email e@mail.com --xCol WT_col --yCol KD_colUse this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. For example -profile cbe invokes the execution of processes using the slurm workload manager. If no profile is given the pipeline will be executed locally.
Mastertable containing at least quantification results (see classifyIS-nf)
SQL file containing a gene annoation generated with the GenomicFeatures R-package
Email address to fetch EntrezIDs with BioPython
column in the mastertable holding SNS-seq read quantification results for WT
column in the mastertable holding SNS-seq read quantification results for KD
adds diagonal lines in distance of foldChange to plot
Lower bound for axis values of x- and y-axis (Default: 0)
Upper bound for axis values of x- and y-axis (Default: 8)
Prefix for the result files name
Folder to which results will be written (is created if not existing)
The pipeline generates a bunch of result files:
- An extended
*.master.tsvfile with annotated gene features - A
*.chipseeker.mapped.tsvfile is a tab-separated file containing the annotation results - A bunch of density plots showing the distribution of gene features over the mapped initiation sites
The pipeline was developed by Daniel Malzl for use at the IMP, Vienna.
Many thanks to others who have helped out along the way too, including (but not limited to): @t-neumann, @pditommaso.
- Nextflow
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
-
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V (2013). “Software for Computing and Annotating Genomic Ranges.” PLoS Computational Biology, 9. doi: 10.1371/journal.pcbi.1003118
-
Yu G, Wang L, He Q (2015). “ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization.” Bioinformatics, 31(14), 2382-2383. doi: 10.1093/bioinformatics/btv145
-
Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)
-
Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011). doi: 10.1109/MCSE.2011.37
-
Virtanen, P., Gommers, R., Oliphant, T.E. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020). doi: 10.1038/s41592-019-0686-2
-
John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007). doi: 10.1109/MCSE.2007.55