Skip to content

nmf: implement marker gene finding in NMFOutput #339

@sjfleming

Description

@sjfleming

The section

Identification of marker genes

from the paper here

Looks like they do something a bit involved. They z-score the TPM matrix (per gene) to get z_tpm_ng. Then they compute the (unnormalized consensus) usage matrix alpha_nk and fit the model

z_tpm_ng ~ beta_kg * alpha_nk

using OLS regression and interpret beta_kg as the association between gene g and program k. By using z-scored TPMs z_tpm_ng, they say that beta_kg

can then be interpreted as by how many standard deviations the expression of gene [g] should increase for an additional count of usage being attributed to GEP k. We regress against z-scored expression values rather than the un-normalized expression values so that the coefficients will be comparable between genes expressed on different scales

Otherwise I've noticed that highly expressed genes (MALAT1, ribosomal genes, etc.) get priority in pretty much all the programs...

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions