Skip to content

Different p value estimates between bulk and single gene set testing #6

Open
@teng-gao

Description

@teng-gao

Thanks for creating this tool, I'm finding it very useful for my analysis. I am trying to first screen a number of gene sets using the bulk option and then visualize specific ones that are significant. However, I noticed that for the same gene set the p value being shown on the plot is different from the bulk testing. I understand that these are empirical p values so the exact values may not match. However, they are different by an order of magnitude (see below). Do you have insights into why this is the case?

gsea_out = bulk.gsea(
    values = fold_changes %>% 
        filter(gene %in% expressed_genes) %>%
        filter(!is.infinite(logFC)) %>% 
        {setNames(.$logFC, .$gene)},
    set.list = h_gene_sets,
    mc.cores = 10
)

Screen Shot 2022-01-09 at 9 14 39 AM

for (gs in c('INTERFERON_GAMMA_RESPONSE', 'INTERFERON_ALPHA_RESPONSE')) {

    gsea(
        fold_changes %>% 
            filter(gene %in% expressed_genes) %>% 
            filter(!is.infinite(logFC)) %>% 
            {setNames(.$logFC, .$gene)},
        h_gene_sets[[gs]],
        main = str_remove(gs, 'HALLMARK_')
    )
  
}

Screen Shot 2022-01-09 at 9 15 36 AM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions