Skip to content

Documentation of and code for analyses performed to assess the effect of different methods for cleanup and pre-processing of raw scRNAseq data on downtream analyses.

License

Notifications You must be signed in to change notification settings

harvardinformatics/scRNAseq-preprocessing

Repository files navigation

scRNAseq-preprocessing

Documentation of and code for analyses performed to assess the effect of different methods for cleanup and pre-processing of raw scRNAseq data on downtream analyses.

Goals

The primary objective of our methods assessment is to evaluate the sensitivity of downstream analyses of scRNA-seq data such as clustering, marker gene discovery and differential expression analysis across samples to the preprocessing and data cleanup steps that are employed to generated a filtered expression matrix. A broad overview of the options we evaluate are summarized in this schematic:

workflow schematic

Data

In order to assess different tools and options for pre-processing scRNA-seq data, we downloaded publicly available data sets that were generated with 10x Chromium chemistry and sequenced on various contemporary Illumina sequencing instruments. We focused on datasets generated for mouse (Mus musculus) as the genome assembly and annotation are of exceptionally high quality, and there are no issues regarding patient anonymity that restrict data access as in the case human data. Below are the datasets we analyzed.

Species Strain Sample ID Tissue Droplet Input Sequencing Chemistry Platform Estimated Cells Mean Reads Per Cell Median Genes Per Cell Data Source Fastq Link(s) Notes
Mouse NA neuron_10k_v3 cortex, hippocampus and sub ventricular zone cells Chromium 10x 3' Gene Expression v3 NovaSeq 11,831 30,184 3,684 10x same as data source E18 developmental stage
Mouse C57BL/6 L8TX_181211_01_G12 primary motor cortex cells Chromium 10x 3' Gene Expression v3 NovaSeq6000 8,913 114,812 6,691 nemo run1;run2 2 runs on same library
Mouse C57BL/6 L8TX_190327_01_E04 caudal nucleus, pallidum cells Chromium 10x 3' Gene Expression v3 NovaSeq6000 18,173 77,393 2,734 nemo run1;run2 2 runs on same library
Mouse C57BL/6 L8TX_190509_01_E09 striatum, striatal amygdala cells Chromium 10x 3' Gene Expression v3 NovaSeq6000 13,475 82,801 3,400 nemo run1;run2 2 runs on same library
Mouse C57BL/6 L8TX_210204_01_H05 olfactory region: main and accessory olfactory bulbs cells Chromium 10x 3' Gene Expression v3 NovaSeq6000 10,895 136,593 3,971 nemo fastq

About

Documentation of and code for analyses performed to assess the effect of different methods for cleanup and pre-processing of raw scRNAseq data on downtream analyses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published