-
Notifications
You must be signed in to change notification settings - Fork 2
Step 1: Getting Started
BEANIE is a powerful tool for comparing differences between groups that share a subpopulation of cells. It is recommended to use BEANIE after fine-grained cell-type annotations are done using scanpy/seurat or an equivalent single-cell data analysis pipeline.
Before running BEANIE, it is necessary to subset the data to a particular cell subpopulation of interest using the subset() function in Seurat or the [ ] operator in scanpy. For example:
# subset seurat object (sobj) to particular cell subpopulation
sobj_subset = subset(sobj, idents=c("tumor_subpopulation1"))
# subset scanpy object (adata) to particular cell subpopulation
adata_subset = adata[adata.obs.cell_type=="tumor_subpopulation1",:]
BEANIE uses the (genes x cells) counts matrix as input. .csv, .tsv, .h5ad file formats are acceptable. It should also be specified whether the input counts matrix is normalised or not, using the parameter normalised when creating a BEANIE object.
For a seurat object sobj_subset, export the counts matrix as a .csv file as follows:
# export as a .csv file
write.csv(sobj_subset@assays$RNA@counts, "counts.csv", quote=F)
Counts matrix prepared in this way is usually already normalised if the Seurat workflow is followed. Therefore, normalised = True must be set.
It is recommended to use the anndata object directly as input to BEANIE for faster run-time. In this case, BEANIE uses the .raw layer, if present, otherwise uses the default layer. It is important that the data in this layer is only normalised and NOT scaled.
# export as .h5ad file
adata_subset.write_h5ad("adata_subset.h5ad")
Alternatively, a .csv or .tsv formatted input file may also be extracted as follows (though it slows the run time) -
# export as .csv format
counts_df = pd.DataFrame(adata_subset.raw.X, index = adata_subset.raw.var_names, columns = adata_subset.raw.obs_names)
counts_df.T.to_csv("counts.csv")
Counts matrix prepared in this way is usually already normalised if the scanpy workflow is followed. Therefore, normalised = True must be set.
This file should contain two columns, sample_id and group_id, corresponding to each cell present in the counts matrix. The method currently supports comparisons between two groups. .csv and .tsv formats are accepted.
For a seurat object sobj_subset, export the metadata file as a .csv file as follows:
write.csv(sobj_subset@meta.data, "metad.csv", quote=F)
For a scanpy object adata_subset, export the metadata file as a .csv file as follows:
adata_subset.obs.to_csv("metad.csv")
The test signatures file is a file that contains a list of genes for each gene signature that needs to be tested. Example files can be found in the test_data folder. The test signatures file can be in one of the following acceptable formats:
-
.gmt- This format has each row containing the gene names of a particular signature, with row names corresponding to the signature's name. -
.csv/.tsv- This format has every column containing the gene names of a particular signature, with column names corresponding to the signature's name.