Hierarchical GAM demo#272
Conversation
swo
left a comment
There was a problem hiding this comment.
I finally started to dig into the math and the code and have a lot of questions!
| k = m + p - 1 | ||
| $$ | ||
|
|
||
| The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$. |
There was a problem hiding this comment.
I'm still unclear on what the basis functions are. Presumably there are many that you can pick from. What are their functional forms? Do you optimize over this selection?
There was a problem hiding this comment.
It's also confusing that
There was a problem hiding this comment.
After some more reading, it seems like these are https://en.wikipedia.org/wiki/B-spline ?
There was a problem hiding this comment.
Yes, basis splines are the way to go when doing something like this. Bases are pre-computable, limiting the work you need to do in NumPyro
| ```math | ||
| \begin{align*} | ||
| p(\beta,\lambda,\sigma |y) & ∝ p(y |\beta,\sigma)p(\beta|\lambda, \sigma)p(\lambda)p(\sigma) \\ | ||
| p(y|\beta, \sigma) & \sim MultiNormal(X\beta, \sigma I) \\ |
There was a problem hiding this comment.
I think this is more easily written as
| \begin{align*} | ||
| p(\beta,\lambda,\sigma |y) & ∝ p(y |\beta,\sigma)p(\beta|\lambda, \sigma)p(\lambda)p(\sigma) \\ | ||
| p(y|\beta, \sigma) & \sim MultiNormal(X\beta, \sigma I) \\ | ||
| p(\beta|\lambda, \sigma) & \sim MultiNormal(0, (\sigma/\lambda)S^{-}) \\ |
There was a problem hiding this comment.
Yes, it should be
There was a problem hiding this comment.
Yes, pseudoinverse matrix is used when
| #### Deriving prior of $\beta$ | ||
|
|
||
| As we assume the link function is identity, that indicates the data $y$ follows normal distribution with covariance matrix $\sigma^2I$. | ||
| As we assume the link function is identity, that indicates the data $y$ follows normal distribution with covariance matrix $\sigma I$. |
There was a problem hiding this comment.
I'm not sure I follow; you could have a different link function and still have the data be normally distributed.
There was a problem hiding this comment.
You are right. Identity function is the canonical link function of normal distribution (of response variable). People typically use it to indicate a linear regression, where the error is normally distributed:
but other link function can be used with the error is still normally distributed (like when
Let me rephrase!
|
|
||
| The group factors are season ($s$) and geography ($g$). Their effects are introduced in $\beta$, to allow varying shape of the spline function adjusted by each $\beta_k$ to control the corresponding basis function $B_k$. | ||
|
|
||
| Given the population mean of $\beta$, denoted as $\bar \beta$, the $\beta$ specific to a certain season ($s=i$) and certain geography ($g=j$) is $\bar \beta$ plus vector $\delta_{s=i}$ and $\delta_{g=j}$. $\delta_{s=i}$ defines the deviation of the certain season from $\bar \beta$ and the certain geography from $\bar \beta$. |
There was a problem hiding this comment.
Usually when people use a single symbol like
So
If you want to have different kinds of
There was a problem hiding this comment.
I know what you mean, but it's very non-standard notation
There was a problem hiding this comment.
I'm realizing that part of this is because people use lowercase letters to mean indices. It wouldn't be crazy to use uppercase, so that
There was a problem hiding this comment.
This part has been deleted as it doesn't follow hierarchical structure, per discussion
| X: ArrayLike, | ||
| estimate: ArrayLike, | ||
| data: pl.DataFrame, | ||
| p: int = 2, |
There was a problem hiding this comment.
I'm realizing that it's confusing that "p" is probability, likelihood, and also degree of the spline
|
|
||
| # Penalized precision matrix, add 1e-6 to make sure stability | ||
| precision = (lam * S) + 1e-6 * jnp.eye(p) | ||
| if groups is None: |
There was a problem hiding this comment.
I think you can drop this. We'll always have groups, even if it's just season.
There was a problem hiding this comment.
The script has been deleted, per discussion
|
|
||
| z = numpyro.sample( | ||
| f"z_{idx}", | ||
| dist.MultivariateNormal(0, jnp.eye(k)), |
There was a problem hiding this comment.
You don't need multivariate normal here. This is a vector of values, so you can just use dist.Normal
|
|
||
| if __name__ == "__main__": | ||
| ## model fitting ## | ||
| for idx in data["group_combo_idx"].unique(): |
There was a problem hiding this comment.
I'm trying to follow the logic here:
- Pick common
$\lambda$ ,$\sigma_\mathrm{season}$ ,$\sigma_\mathrm{geo}$ , and$\sigma_Z$ to be used across all states/seasons - For each state/season, fit a spline
- Adjust that spline based on the common values
This means that the only way the fit for one state/season "sees" the data from other seasons is via the shared parameters (
So I'm confused first about the merit of having a prior on
Second, I'm confused how this is a hierarchical model, since we're not sharing information about the states or the seasons. I expect this means that the forecasts don't look very good?
There was a problem hiding this comment.
The script has been deleted, per discussion
Co-authored-by: Scott Olesen <ulp7@cdc.gov>
afmagee42
left a comment
There was a problem hiding this comment.
It could just be my relative inexperience with splines, or being out of date on the project at hand, but I am not sure I see how this model does what I thought we wanted it to do?
| ```math | ||
|
|
||
| g^{-1}(E(y)) = X\beta + \beta_0 | ||
| g^{-1}(E(y)) = X\beta |
There was a problem hiding this comment.
Link function framing feels to me like an easy way for us to either get trapped in frequentist thinking or backed into corners we don't want to be in.
| k = m + p - 1 | ||
| $$ | ||
|
|
||
| The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$. |
There was a problem hiding this comment.
Yes, basis splines are the way to go when doing something like this. Bases are pre-computable, limiting the work you need to do in NumPyro
| ``` | ||
|
|
||
| $y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$. | ||
| $y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by: |
There was a problem hiding this comment.
| $y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by: | |
| $y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $K$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by: |
I think this was just a typo and big K was intended, rather than little k?
There was a problem hiding this comment.
I would also not call this a definition, it's a relationship. The definition comes when you choose which of
| $y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by: | ||
|
|
||
| $$ | ||
| k = m + p - 1 |
There was a problem hiding this comment.
| k = m + p - 1 | |
| K = m + p - 1 |
Same typo, I think?
| L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma)) | ||
| ``` | ||
|
|
||
| Using empirical Bayes approach, we can derive: |
There was a problem hiding this comment.
It seems to me that the prior of
|
|
||
| ```math | ||
| L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma^2)) | ||
| L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma)) |
There was a problem hiding this comment.
FWIW, in my estimation in so far as MCMV has coalesced on a style, we don't use L(params) for the likelihood, we write p(data | params)
| ``` | ||
|
|
||
|
|
||
| #### Deriving prior of $\beta$ |
There was a problem hiding this comment.
Why are we doing this? This sems like an attempt to make a prior out of a frequentist penalty function, which does not work well. When doing maximum likelihood, all you need to penalize is a point. We need to penalize a distribution. If you could just apply a Bayesian prior matching a frequentist penalty function, "the Bayesian lasso" (aka regression with exponential priors) would provide good sparsity, but it doesn't.
There was a problem hiding this comment.
This is what Wood et al done to derive prior from the penalty function in section 2.4. I'm new on this topic, can you explain why it doesn't work well, is it frequentist penalty function penalizes a point while Bayesian way is to penalize a distribution, and the connection between these two is questionable?
|
|
||
| ```math | ||
| \begin{align*} | ||
| \beta_{total} &= \bar \beta + \delta_{s=i} + \delta_{g=j} \\ |
There was a problem hiding this comment.
What does this partial pooling structure imply about the functional forms per state-season? It is not clear to me that
- There is any guarantee they have to look remotely similar
- This looks anything like our general idea that functional forms are consistent give or take shifts to the start of uptake and peak uptake
There was a problem hiding this comment.
Per discussion, the function forms are not shared and it's like no pooling. This part has been deleted
|
If |
|
Per discussion, the scripts are not truly hierarchical, and it takes much efforts to build HGAM in numpyro. The scripts have been deleted. The doc has been rephrased per comments, and the part about hierarchy in the document has been deleted. |
|
Close per discussion |
For vaccine uptake in a single season, the coefficient controlling each basis function are estimated given design matrix and penalty matrix. The final outcome of estimated coefficients is a vector$\beta$ , along with other estimates: $\lambda, \sigma$ .
When vaccine uptake data are from multiple seasons and states, instead of directly estimating$\beta$ , the deviation vector $\delta$ of a certain state and a certain season from the population mean of $\beta$ is estimated. Each element in $\delta$ is the deviation from the population coefficient controlling each basis function.
scipy.interpolateonly accepts the data from a single season. Thus, when adding group factors, the design matrix and penalty matrix for a single season, for each level in season (There exists an error about the incompatibility between the np.array from
make_lsq_splineand the jnp.array used in numpyro. This requries refactoring the code to put the function of getting design matrix and penalty matrix outside of the numpyro model. I want to make sure we are on the same page before that!