MiniMarS finds the protein markers that best define biological clusters, using a number of different pre-existing methods for marker selection.
Please install the following packages first.
# CiteFuse
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("CiteFuse")
devtools::install_github("tpq/propr") # propr package required for CiteFuse to run
# sc2marker
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("https://github.com/CostaLab/sc2marker", build_vignettes = TRUE)
# geneBasisR
devtools::install_github("MarioniLab/geneBasisR")
# Seurat
install.packages('Seurat')
# xgboost
install.packages("xgboost")
# SingleCellExperiment
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SingleCellExperiment")
# dplyr
install.packages("dplyr")
Please run the following to install the MiniMarS package (typical install time: 5-10 minutes):
devtools::install_github("https://github.com/raymondlouie/MiniMarS",force=TRUE)
or download the package here and install it using the following command
install.packages("~/Downloads/MiniMarS_0.3.1.tar.gz", type = "source", repos = NULL)
Expected runtime: 1 minute
# Check to see if all the packages are installed.
packages_required = c("CiteFuse","sc2marker","geneBasisR","xgboost","dplyr","MiniMarS")
packages_required_not_installed=setdiff(packages_required, rownames(installed.packages()))
if (length(packages_required_not_installed)>0){
stop(paste0("Please install packages ",packages_required_not_installed,"\n"))
}
library(MiniMarS)
library(dplyr)
library(SingleCellExperiment)
The input data can be i) an SCE object, ii) a matrix of features and a vector of cell type annotations, or iii) a Seurat object.
i) If you have a SCE object sce:
sc_in = processInputFormat(sc_object = sce,
sce_cluster = "cell_type", #pre-defined in the SCE object
verbose = TRUE)
ii) If you have a feature matrix (feature x cell) input_matrix and a vector of cell type annotations for each cell cluster (the length of this vector should be the same as the number of columns of the input_matrix):
sc_in = processInputFormat(sc_object = input_matrix,
clusters_all = clusters,
verbose = TRUE)
iii) If you have a Seurat object:
library(Seurat)
sc_in = processInputFormat(sc_object = seurat_object,
verbose=TRUE)
Finds the minimum number of markers to satisfy a certain threshold using a particular macro metric (i.e, a single metric across all clusters) using our wrapper function, which selects the top method by default, out of the following methods: "citeFUSE", "sc2marker", "geneBasis", "xgBoost", "fstat", "seurat_wilcox", "seurat_bimod", "seurat_roc", "seurat_t", "seurat_LR", "consensus_weighted", "consensus_naive", "consensus_fstat", and "consensus_xgboost". The user can also select the clusters to identify the markers for, with all clusters selected as default.
minMarker_result <- minMarker(sc_in,
list_markersNumber=c(5,10,15,20,25,30,40),
chosen_measure = "F1_macro",
clusters_sel="all_clusters",
threshold = 0.7)
library(ggplot2)
list_markers_all = minMarker_result[[length(minMarker_result)]]$markersAll
list_performance_all= minMarker_result[[length(minMarker_result)]]$performanceAll
plotMarkers(list_markers_all)
plotPerformance(list_performance_all)
## Visualisation
library(RColorBrewer)
plotExpression(list_markers_all,
sc_in,
plot_type="violin")
plotExpression(list_markers_all,
sc_in,
plot_type="umap")
sessionInfo()
[1] sp_1.4-6 SeuratObject_4.1.0 Seurat_4.1.1.9002 SingleCellExperiment_1.16.0
[5] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
[9] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[13] matrixStats_0.61.0 dplyr_1.0.7 MiniMarS_0.1.0
Under some scenarios, the users may want to test their own customised marker panel instead of the predicted ones. We recommend using the following codes to evaluate the performance of the user's customised marker input.
user_markers = c("CD80", "CD86", "CD274", "CD273", "CD275", "CD11b", "CD137L", "CD70", "unidentified_marker1","unidentified_marker2")
own_list_performance = performanceOwnMarkers(user_markers,
final_out = final_out,
method = "all",
nrounds = 1500,
nthread = 6,
verbose = TRUE)
print(own_list_performance)
We provide below public datasets for users to try the MiniMarS package. The datasets all contain protein features as well as either cell annotation or cluster labels from the corresponding papers or websites.
Human bone marrow samples from the BD Rhapsody assay.
Number of cells in this dataset: around 100,000 (after QC: 49,100 from healthy controls and 31,600 from leukemia patients)
Number of protein features: 97
43 cell types/clusters from the data provider
Click here to access the data
Human mucosa-associated lymphoid tissue samples from the 10X technology.
Number of cells in this dataset: around 10,000 (after QC: 8,500)
Number of protein features: 17
11 cell types/clusters from the data provider
Click here to access the data
Mouse spleen and lymph nodes samples from the 10X technology.
Number of cells in this dataset: around 40,000 (after QC: 16,800 with 111 protein features, 15,800 with 206 protein features)
Number of protein features: 111 and 206 (two sets of data) - need to find the name of each protein feature (16 Mar 2023)
Both sets, 35 cell types/clusters from the data provider
Click here to access the raw data
Click here to access the processed data
Human PBMC (peripheral blood mononuclear cells) samples from the 10X technology.
Number of cells in this dataset: around 10,000 (after QC: 7,800)
Number of protein features: 17
11 cell types/clusters from the data provider
Click here to access the data
Please see our paper for more details, including runtime.