-
Notifications
You must be signed in to change notification settings - Fork 2
Pipeline_Content
This workflows takes VCF files, genome sequences and annotations as input, and returns annotated vcf ready for downstream analysis. It uses Snpeff and SnpSift, alongside with GWAS Catalog, dbNSFP, and GeneSets.
If you use this pipeline, cite them all, please!
FastQC stands for FastQ Quality Control. There is no publication associated with this tool, however, it remains an inescapable classic in bioinformatics.
This tool compiles a lot of informations about the raw reads obtained from sequencers. Actually, this tool does not perform any process required for the whole splicing analysis; however, it's a good practice.
Citation:
* Andrews, Simon. "FastQC: a quality control tool for high throughput sequence data." (2010).
MultiQC, just like FastQC, do not have any other purpose than quality metrics. It gathers all Flagstat and all FastQC individual metrics into one single report.
Citation:
* Ewels, Philip, et al. "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32.19 (2016): 3047-3048.
Snpeff and SnpSift are two tools designed to annotated, filter and clean VCF files.
Citation:
* Cingolani, Pablo, et al. "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3." Fly 6.2 (2012): 80-92.
* Ruden, Douglas Mark, et al. "Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift." Frontiers in genetics 3 (2012): 35.
dnNSFP is a database that provides human non-synonymous SNPs and their functional predictions. It comes with a very large set of databases annotations.
Citation:
* Liu, Xiaoming, Xueqiu Jian, and Eric Boerwinkle. "dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions." Human mutation 32.8 (2011): 894-899.
* Liu, Xiaoming, Xueqiu Jian, and Eric Boerwinkle. "dbNSFP v2. 0: a database of human non‐synonymous SNVs and their functional predictions and annotations." Human mutation 34.9 (2013): E2393-E2402.
* Liu, Xiaoming, et al. "dbNSFP v3. 0: A one‐stop database of functional predictions and annotations for human nonsynonymous and splice‐site SNVs." Human mutation 37.3 (2016): 235-241.
The GWAS Catalog is a curated resources of SNP-traits published in genome-wide studies. It make easy to link variants with related publications and studies.
Citation:
* Welter, Danielle, et al. "The NHGRI GWAS Catalog, a curated resource of SNP-trait associations." Nucleic acids research 42.D1 (2014): D1001-D1006.
* MacArthur, Jacqueline, et al. "The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)." Nucleic acids research 45.D1 (2017): D896-D901.
MSigDB is a database that provides SNPs and their functional predictions.
Citation:
* Liberzon, Arthur, et al. "Molecular signatures database (MSigDB) 3.0." Bioinformatics 27.12 (2011): 1739-1740.
* Liberzon, Arthur, et al. "The molecular signatures database hallmark gene set collection." Cell systems 1.6 (2015): 417-425.
Snakemake is a pipeline/workflow manager written in python. It is used to handle the tools interaction, dependencies, command lines and cluster reservation. It is the skeleton of this pipeline. This pipeline is powered by the Snakemake-Wrappers, the Snakemake Workflows, and the conda project.
Citation:
* Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.
Typos corrections and issues are welcomed