Skip to content

Unexpected number of genes identified with FindVariableFeatures() #9808

Open
@EliseCoopman

Description

@EliseCoopman

Hi,

I am using the FindVariableFeatures() function from Seurat to identify genes with the most variability in our dataset. I'm using the default parameters (e.g., nfeatures=2000), but I end up identifying only 656 genes, which is much fewer than expected.

Is it correct that FindVariableFeatures() calculates variable genes on an individual basis (for each counts layer) and then aggregates them into an overall list? How does Seurat decide to end up with just 656 genes? Is this behavior expected, or might there be an issue with how variability is being calculated or selected? Here the code I used:

library(BPCells)
library(Seurat)

seuratobj_splitted <- SplitObject(seuratobj, split.by = "sample_id")

# Remove lowly expressed genes
seuratobj_splitted <- lapply(seuratobj_splitted, function(x) {
    expressed_genes <- rowSums(x[["RNA"]]$counts > 0) >= 10 
    expressed_genes <- names(expressed_genes)[expressed_genes] 
    subset(x, features = expressed_genes)
})

for (lib in names(seuratobj_splitted)) {
    seuratobj_splitted[[lib]]@project.name <- lib
}

# Merge into a single object
seuratobj <- merge(x = seuratobj_splitted[[1]], y = seuratobj_splitted[-1])

seuratobj <- NormalizeData(seuratobj)
seuratobj <- FindVariableFeatures(seuratobj, selection.method = "vst", nfeatures = 2000)
#Finding variable features for layer counts.individualA
#Finding variable features for layer counts.individualB
#Finding variable features for layer counts.individualC
#Finding variable features for layer counts.individualD
#...

top2000 <- head(VariableFeatures(seuratobj), 2000)
length(top2000)
[1] 656

Thanks!

Elise

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions