nmf: implement marker gene finding in NMFOutput

The section

> Identification of marker genes

from [the paper here](https://elifesciences.org/articles/43803#s4)

Looks like they do something a bit involved. They z-score the TPM matrix (per gene) to get `z_tpm_ng`.  Then they compute the (unnormalized consensus) usage matrix `alpha_nk` and fit the model
```
z_tpm_ng ~ beta_kg * alpha_nk
```
using OLS regression and interpret `beta_kg` as the association between gene `g` and program `k`. By using z-scored TPMs `z_tpm_ng`, they say that `beta_kg`
> can then be interpreted as by how many standard deviations the expression of gene [g] should increase for an additional count of usage being attributed to GEP k. We regress against z-scored expression values rather than the un-normalized expression values so that the coefficients will be comparable between genes expressed on different scales

Otherwise I've noticed that highly expressed genes (MALAT1, ribosomal genes, etc.) get priority in pretty much all the programs...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nmf: implement marker gene finding in NMFOutput #339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nmf: implement marker gene finding in NMFOutput #339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions