Skip to content
Josh Loecker edited this page Sep 21, 2021 · 15 revisions

Welcome to the FastqToGeneCounts wiki!

This section is partially completed, please check back later

This is a snakemake workflow that aims to do several things, using as much parallelization as possible:

  1. Given a CSV file containing: SRR Codes, a target output name, and Paired End or Single End reads
  2. Generate genome files using STAR
  3. Download each SRR code in parallel using prefetch
  4. Unpack the .sra files using parallel-fastq-dump, generating .fastq.gz files
  5. Optionally trim the resulting .fastq.gz files (using Trim Galore)
  6. Perform FastQC on the parallel-fastq-dump files, and optionally on the resulting trimmed files
  7. Perform STAR align on files from parallel-fastq-dump (or trim) files to the generated genome files
  8. Perform MultiQC, using the files from parallel-fastq-dump, FastQC, and STAR algner

Ultimately, the results of this project will be a series of .fastq.gz files available, along with reports from FastQC and MultiQC. The .fastq.gz files may be used in further analysis

Sections

  1. (Getting started)[Getting-Started]
  2. Downloading
  3. Installing
  4. Running

Clone this wiki locally