This is an R implementation of the CENTIPEDE model first proposed by R Pique Regi et al. 2011, Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Though the model was originally implemented for DNAse-seq datasets, we extend its applications to single nuclei ATAC-seq data.
Model notation – we have matrix
Marginalizing out
Where
Where
The complete likelihood (if
And the corresponding log-likelihood:
As
E-step — First, we calculate the expectation of
M-step — Replacing
Where
Unfortunately, the parameters of the bound and unbound negative binomial distributions, glm.nb from the R package MASS, with the weights set to
The quantity
Where W is a