Please cite using the zenodo DOI: 10.5281/zenodo.12726713
A package to perform single cell analysis using various tools. The package centers around using Seurat objects and has functions for preprocessing, plotting, running differential expression and gene ontology, and naming popultions.
This package works with Seurat version 3 objects.
If you use this package for your own analysis, please acknowledge this repository. You can also consider citing my eLife paper. The package I wrote for that paper was updated to work with Seurat version 3 objects here.
Wells, K. L., Miller, C. N., Gschwind, A. R., Wei, W., Phipps, J. D., Anderson, M. S., & Steinmetz, L. M. (2020). Combined transient ablation and single cell rna sequencing reveals the development of medullary thymic epithelial cells. ELife, 9, 1–80. https://doi.org/10.7554/eLife.60188
Install clustifyr
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("clustifyr")Install devtools
install.packages("devtools")Install scAnalysisR
library(devtools)
install_github("CUAnschutzBDC/scAnalysisR")This package is meant as a wrapper for many functions associated with single cell Analysis. It performs functions from loading in the data, processing the data, running differential expression, running gene ontology, and plotting.
There are a few steps that you will need to run outside of this package, for example filtering data based on quality, adding mitochondrial percents, and demultiplexing hashtags. If you want any functionality added here, feel free to submit an issue.
Data that has been processed with cellranger can be read in using the create_seurat_object function. This has a few important arguments
sample- The name of the sample. This should match the sample name that you provided to cellranger as it is used to identify the path of the cellranger output. This name will also be added as theorig.identin the seurat object.count_path- If you rancellranger, the path to the output directory you gave cellranger. Otherwise, this path can be more specific. See below for details on the pathADT- Were ADTs included in this run? If so, set this toTRUE, if not set this toFALSE. If ADTs were included you must set this toTRUEbecause the input data from cellranger looks different if multiple assays were included.hashtag- Were HTOs included? If so, set this toTRUE.min_features- What minimum features should be passed toCreateSeuratObject. Default is 200min_cells- What minimum cells should be passed toCreateSeuratObject. Default is 3.hashtag_ident- What are the identities of the hashtags that you passed in your antibody file to cellranger? This should be a list of hashtags iec("HTO1", "HTO2", "HTO3"). This is used to pull out the hashtags and put them into a separate assay calledHTO.tenx_structure- What cellranger was run? Options arecount,multi,multi7, ornone. This tells how to find the input data.
Path to data
The path to the data is defined by:
samplecount_pathtenx_structure
If you ran cellranger count on your data and gave it results as your output directory and sample1 as your sample name, the files we need are in results/sample1/outs/filtered_feature_bc_matrix. To tell this funtion how to find this data, you just need:
so <- create_seurat_object(sample = "sample1", count_path = "results", tenx_structure = "count")If you ran cellranger multi on your data and gave it results as your output directory and sample1 as your sample name, the files we need are in results/sample1/outs/count/filtered_feature_bc_matrix. To tell this funtion how to find this data, you just need:
so <- create_seurat_object(sample = "sample1", count_path = "results", tenx_structure = "multi")If you ran cellranger multi with cellranger 7 or above on your data and gave it results as your output directory and sample1 as your sample name, the files we need are in results/sample1/outs/per_sample_outs/sample1/count/filtered_feature_bc_matrix. To tell this funtion how to find this data, you just need:
so <- create_seurat_object(sample = "sample1", count_path = "results", tenx_structure = "multi7")The goal of this function is to very easily find your files with minimal input from you. If there is a problem finding the files, the error message will print the path of the directory where it is looking for the files so you can more easily debug. In the case that your output files are not in the above configuration, you can provide the path to the directory that contains the matrix.mtx, features.tsv, and barcodes.tsv files and have tenx_structure = "none"
so <- create_seurat_object(sample = "sample1", count_path = "full/path/to/filtered_feature_bc_matrix",
tenx_structure = "none")HTOs and ADTs
If you included any antibody capture elements in your cellranger run, you will need to set either HTO or ADT or both to TRUE. This is because the data will be read in as a list and the function will fail if it does not know to look for this.
- If
ADT = TRUEandHTO = FALSE, your Seurat object will be returned with anADTassay with all of the features from your antibody file as the features. Your seurat object will also have anRNAassay that contains the gene expression data. - If
ADT = TRUEandHTO = TRUE, your Seurat object will be returned with anADTassay with all of the features from your antibody file except for those listed inhashtag_identsas the features. It will also have aHTOassay will all of the features from thehashtag_identsas your features. If you don't sethashtag_idents, it will search for any features that havehashtagin the name. Your seurat object will also have anRNAassay that contains the gene expression data. - If
ADT = FALSEandHTO = TRUE, your Seurat object will be returned withHTOassay will all of the features from thehashtag_identsas your features. If you don't sethashtag_idents, it will search for any features that havehashtagin the name. Your seurat object will also have anRNAassay that contains the gene expression data.
To run PCA, use the function PCA_dimRed.
so <- PCA_dimRed(so, assay = "RNA", reduction_name = NULL, vars_to_regress = NULLThis has only a handful of arguments
sample_object- The seurat object loaded in bycreate_seurat_objectassay- The assay to runPCAon. This can be any assay in your object, for exampleRNA,ADT, orSCT. If you are running on theADTassay, if you have 30 or fewer ADTs you likely should just runUMAPdirectly on the ADT expression data and not perform PCA first.reduction_name- What the name of the reduction should be. These will be the names of the dimensional redution saved in the seurat object, so you can run this function on all assays without overriding the existing dimensional reductions. Many are provided but can be overridden by providing a avlue toreduction_name. The default values are:- If
assay = "RNA":reduction_name = pca - If
assay = "ADT":reduction_name = apca - If
assay = "SCT":reduction_name = sctpca - If
assay = integrated:reduction_name = pca
- If
vars_to_regress- Only used if running on theADTassay. These are used when scaling the ADT data before running PCA.
The return of this is a seurat object with a new dimensional reduction named based on the reduction_name provided
Helpful plots
To help evaluating the PCA, a plotting function is provided that returns a handful of qc plots.
qc_plots <- plot_PCA(so, HTO = FALSE, ADT = FALSE,
assay = "RNA", jackstraw = TRUE,
reduction = NULL, data_type = "RNA",
ffpe = TRUE)This function takes your seurat object and returns helpful qc plots as a named list
- The top genes assocaited with PC1 and PC2
- PC plot colored by the sample (orig.ident)
- PC plot colored by the percent mitochondrial reads
- PC plot colored by the number of RNA features
- PC plot colored by the number of RNA counts
- PC plot colored by the number of ADT counts (if
ADT = TRUE) - PC plot colored by the number of ADT feature (if
ADT = TRUE) - The HTO classification (if
HTO = TRUE) - A Jackstraw plot (if
jackstraw = TRUE) to show how many PCs to use for downstream analysis. - An Elbow plot to show how many PCs to use for downstream analysis
This function will assume your pca reduction is named based on the default values above. If your pca reduction is not based on the reductions above, you can supply your own to reduction.
The ffpe is only used if data_type = spatial. It is ignored if data_type = "RNA"
Note this function requires that you have identified the percent mitochondiral reads and saved them as a column "percent.mt" in your seurat object metadata. If you have HTOs, this also requires that you have a column callsed "HTO_classification" that identifies the different hashtagged samples.
The arguments of this function are:
sample_objectA seurat objectHTOOPTIONAL if HTOs were included in the seurat object. Default is FALSEADTOPTIONAL If ADTs were included in the experiment. Default is FALSEassayOPTIONAL What assay to use. This is just used to locate the PCA reduction (the reduction name provided assumes that no reduction name was supplied to PCA_dimRed, if you did provide a different reduction name, you must use "reduction" below). Default is "RNA".jackstrawOPTIONAL if a jackstraw plot should be made. This can help in determining number of PCs to use, but it can be very slow. If TRUE, a jackstraw plot will only be made if the assay is also "RNA". Default is TRUE.reductionOPTIONAL the name of the PCA reduction. Not required if you used the default reduction names in PCA_dimRed. Default is NULLdata_typeOPTIONAL If the data is "RNA" or "spatial". Default is "RNA"ffpeOPTIONAL If the data type is "spatial" is it frozen (FALSE) or ffpe (TRUE), This is important because ffpe samples are from a probe based approach and don't include mitochondiral genes.
One function exists to perform UMAP dimensonality reduction and clustering. This is because it is best to keep the number of PCs consistent when running both of these functions.
To generate a UMAP and clusters
umap_res <- group_cells(so, sample_name = NULL, save_dir = NULL,
nPCs = 10, resolution = 0.8, assay = "RNA",
HTO = FALSE, reduction = NULL)
so <- umap_res$object
umap_plots <- umap_res$plotsThis function both generates a UMAP and clustering and generates helpful plots to visualize the clustering and the samples. Because it does both, this function returns a list that includes the plots and your object. Be sure to follow the code above to make sure that you are able to get both results.
In this function, setting a save_dir will also save the output plots as pdfs. Leaving as NULL will not save any plots.
- ggrastr
- ggrastr
Example scripts using this package are available at https://github.com/CUAnschutzBDC/snakemake_pipelines/tree/main/scRNA_seq/src/scripts/indiviual_analysis. Follow the scripts in order for a complete analysis.
For examples of integrating data, scripts are available https://github.com/CUAnschutzBDC/snakemake_pipelines/tree/main/scRNA_seq/src/scripts/integrated_analysis