Skip to content
This repository was archived by the owner on Nov 29, 2021. It is now read-only.

Pipeline_Content

tdayris-perso edited this page Mar 6, 2020 · 1 revision

Pipeline content

Global workflow

This workflows takes VCF files, genome sequences and annotations as input, and returns annotated vcf ready for downstream analysis. It uses Snpeff and SnpSift, alongside with GWAS Catalog, dbNSFP, and GeneSets.

If you use this pipeline, cite them all, please!

FastQC

FastQC stands for FastQ Quality Control. There is no publication associated with this tool, however, it remains an inescapable classic in bioinformatics.

This tool compiles a lot of informations about the raw reads obtained from sequencers. Actually, this tool does not perform any process required for the whole splicing analysis; however, it's a good practice.

Citation:

* Andrews, Simon. "FastQC: a quality control tool for high throughput sequence data." (2010).

MultiQC

MultiQC, just like FastQC, do not have any other purpose than quality metrics. It gathers all Flagstat and all FastQC individual metrics into one single report.

Citation:

* Ewels, Philip, et al. "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32.19 (2016): 3047-3048.

SnpEff/SnpSift

Snpeff and SnpSift are two tools designed to annotated, filter and clean VCF files.

Citation:

* Cingolani, Pablo, et al. "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3." Fly 6.2 (2012): 80-92.
* Ruden, Douglas Mark, et al. "Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift." Frontiers in genetics 3 (2012): 35.

dbNSFP

dnNSFP is a database that provides human non-synonymous SNPs and their functional predictions. It comes with a very large set of databases annotations.

Citation:

* Liu, Xiaoming, Xueqiu Jian, and Eric Boerwinkle. "dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions." Human mutation 32.8 (2011): 894-899.
* Liu, Xiaoming, Xueqiu Jian, and Eric Boerwinkle. "dbNSFP v2. 0: a database of human non‐synonymous SNVs and their functional predictions and annotations." Human mutation 34.9 (2013): E2393-E2402.
* Liu, Xiaoming, et al. "dbNSFP v3. 0: A one‐stop database of functional predictions and annotations for human nonsynonymous and splice‐site SNVs." Human mutation 37.3 (2016): 235-241.

GWAS Catalog

The GWAS Catalog is a curated resources of SNP-traits published in genome-wide studies. It make easy to link variants with related publications and studies.

Citation:

* Welter, Danielle, et al. "The NHGRI GWAS Catalog, a curated resource of SNP-trait associations." Nucleic acids research 42.D1 (2014): D1001-D1006.
* MacArthur, Jacqueline, et al. "The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)." Nucleic acids research 45.D1 (2017): D896-D901.

MSigDB

MSigDB is a database that provides SNPs and their functional predictions.

Citation:

* Liberzon, Arthur, et al. "Molecular signatures database (MSigDB) 3.0." Bioinformatics 27.12 (2011): 1739-1740.
* Liberzon, Arthur, et al. "The molecular signatures database hallmark gene set collection." Cell systems 1.6 (2015): 417-425.

Snakemake

Snakemake is a pipeline/workflow manager written in python. It is used to handle the tools interaction, dependencies, command lines and cluster reservation. It is the skeleton of this pipeline. This pipeline is powered by the Snakemake-Wrappers, the Snakemake Workflows, and the conda project.

Citation:

* Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.
Clone this wiki locally