Documentation of and code for analyses performed to assess the effect of different methods for cleanup and pre-processing of raw scRNAseq data on downtream analyses.
The primary objective of our methods assessment is to evaluate the sensitivity of downstream analyses of scRNA-seq data such as clustering, marker gene discovery and differential expression analysis across samples to the preprocessing and data cleanup steps that are employed to generated a filtered expression matrix. A broad overview of the options we evaluate are summarized in this schematic:
In order to assess different tools and options for pre-processing scRNA-seq data, we downloaded publicly available data sets that were generated with 10x Chromium chemistry and sequenced on various contemporary Illumina sequencing instruments. We focused on datasets generated for mouse (Mus musculus) as the genome assembly and annotation are of exceptionally high quality, and there are no issues regarding patient anonymity that restrict data access as in the case human data. Below are the datasets we analyzed.
| Species | Strain | Sample ID | Tissue | Droplet Input | Sequencing Chemistry | Platform | Estimated Cells | Mean Reads Per Cell | Median Genes Per Cell | Data Source | Fastq Link(s) | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mouse | NA | neuron_10k_v3 | cortex, hippocampus and sub ventricular zone | cells | Chromium 10x 3' Gene Expression v3 | NovaSeq | 11,831 | 30,184 | 3,684 | 10x | same as data source | E18 developmental stage |
| Mouse | C57BL/6 | L8TX_181211_01_G12 | primary motor cortex | cells | Chromium 10x 3' Gene Expression v3 | NovaSeq6000 | 8,913 | 114,812 | 6,691 | nemo | run1;run2 | 2 runs on same library |
| Mouse | C57BL/6 | L8TX_190327_01_E04 | caudal nucleus, pallidum | cells | Chromium 10x 3' Gene Expression v3 | NovaSeq6000 | 18,173 | 77,393 | 2,734 | nemo | run1;run2 | 2 runs on same library |
| Mouse | C57BL/6 | L8TX_190509_01_E09 | striatum, striatal amygdala | cells | Chromium 10x 3' Gene Expression v3 | NovaSeq6000 | 13,475 | 82,801 | 3,400 | nemo | run1;run2 | 2 runs on same library |
| Mouse | C57BL/6 | L8TX_210204_01_H05 | olfactory region: main and accessory olfactory bulbs | cells | Chromium 10x 3' Gene Expression v3 | NovaSeq6000 | 10,895 | 136,593 | 3,971 | nemo | fastq |
