Skip to content

Recommended_Workflow

Skylar Wyant edited this page Aug 16, 2016 · 15 revisions

Recommended Workflow

Workflow

Methods and workflow. sequence_handling aims to handle the sequence cleaning and quality control process of both paired-end and GBS NGS data. To accommodate the differences in these data, sequence_handling splits each step in the sequence cleaning and quality control process into distinct 'handlers'. Each handler is designed to work with GNU Parallel and PBS Torque; all options are defined in a single configuration file. These handlers can be swapped out for other tools without breaking sequence_handling as long as the intermediate data is the same format. GBS data must be demultiplexed before use, which is done using the FASTX-Toolkit. Quality assessment of FastQ files is done using FastQC, and basic coverage statistics are calculated with bioawk. Adapter sequences are trimmed using Scythe; an optional quality trimming is performed using Sickle, with quality statistics and plots generated from Seqqs and custom R code. Read mapping is done with BWA-MEM. Processing a SAM file consists of several steps: converting to BAM, sorting the reads, deduplicating reads (paired-end only), and adding or removing read groups. These steps are performed by either SAMtools or Picard, depending on the user's preference. Final coverage mapping is generated over genome, gene, and exon space using BEDTools and custom R code.

Clone this wiki locally