Hierarchical GAM demo by Fuhan-Yang · Pull Request #272 · CDCgov/cfa-vaccination-coverage-forecasting

Fuhan-Yang · 2026-02-12T19:33:22Z

For vaccine uptake in a single season, the coefficient controlling each basis function are estimated given design matrix and penalty matrix. The final outcome of estimated coefficients is a vector $\beta$, along with other estimates: $\lambda, \sigma$.

When vaccine uptake data are from multiple seasons and states, instead of directly estimating $\beta$, the deviation vector $\delta$ of a certain state and a certain season from the population mean of $\beta$ is estimated. Each element in $\delta$ is the deviation from the population coefficient controlling each basis function.

scipy.interpolate only accepts the data from a single season. Thus, when adding group factors, the design matrix and penalty matrix for a single season, for each level in season ($i$) and state ($j$), are calculated, and then is used to estimate $\delta_{season=i, state=j}$.

There exists an error about the incompatibility between the np.array from make_lsq_spline and the jnp.array used in numpyro. This requries refactoring the code to put the function of getting design matrix and penalty matrix outside of the numpyro model. I want to make sure we are on the same page before that!

swo

I finally started to dig into the math and the code and have a lot of questions!

swo · 2026-02-17T15:53:58Z

+k = m + p - 1
+$$
+
+The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$.


I'm still unclear on what the basis functions are. Presumably there are many that you can pick from. What are their functional forms? Do you optimize over this selection?

It's also confusing that $k$ is used as a fixed constant (the length of $\beta$) but also as an index.

After some more reading, it seems like these are https://en.wikipedia.org/wiki/B-spline ?

Yes, basis splines are the way to go when doing something like this. Bases are pre-computable, limiting the work you need to do in NumPyro

swo · 2026-02-17T15:55:36Z

+```math
+\begin{align*}
+p(\beta,\lambda,\sigma |y) & ∝ p(y |\beta,\sigma)p(\beta|\lambda, \sigma)p(\lambda)p(\sigma) \\
+p(y|\beta, \sigma) & \sim MultiNormal(X\beta, \sigma I) \\


I think this is more easily written as

$$ p(y_i | \beta, \sigma) \sim \mathrm{Norm}((X \beta)_i, \sigma) $$

swo · 2026-02-17T15:56:15Z

+\begin{align*}
+p(\beta,\lambda,\sigma |y) & ∝ p(y |\beta,\sigma)p(\beta|\lambda, \sigma)p(\lambda)p(\sigma) \\
+p(y|\beta, \sigma) & \sim MultiNormal(X\beta, \sigma I) \\
+p(\beta|\lambda, \sigma) & \sim MultiNormal(0, (\sigma/\lambda)S^{-}) \\


Should this be $S^{-1}$?

Yes, it should be $S^{-1}$, there is paper talking about using $S^-$ as pseudoinverse matrix, when there is singular problem (?) but generally it should be $S^{-1}$.

Yes, pseudoinverse matrix is used when $S$ is not full rank or square, which may not be an issue here...so we can directly use inverse matrix.

swo · 2026-02-17T15:57:08Z

 #### Deriving prior of $\beta$

-As we assume the link function is identity, that indicates the data $y$ follows normal distribution with covariance matrix $\sigma^2I$.
+As we assume the link function is identity, that indicates the data $y$ follows normal distribution with covariance matrix $\sigma I$.


I'm not sure I follow; you could have a different link function and still have the data be normally distributed.

You are right. Identity function is the canonical link function of normal distribution (of response variable). People typically use it to indicate a linear regression, where the error is normally distributed:
$$\begin{align*} y_i &\sim N( \beta_1x_i+ \beta_0, \sigma) \\\ g^{-1}(E(y_i))& = \beta_1x_i+ \beta_0 \\\ \epsilon_i& \sim N(0,\sigma) \end{align*}$$
but other link function can be used with the error is still normally distributed (like when $g^{-1} = log(.)$,
$$\begin{align*} log(y_i) &\sim N( \beta_1x_i+ \beta_0, \sigma) \\\ g^{-1}(E(y_i))& = \beta_1x_i+ \beta_0 \\\ log(E(y_i)) &= \beta_1x_i + \beta_0 \\\ \epsilon_i& \sim N(0,\sigma) \end{align*}$$
Let me rephrase!

swo · 2026-02-17T16:16:56Z

+
+The group factors are season ($s$) and geography ($g$). Their effects are introduced in $\beta$, to allow varying shape of the spline function adjusted by each $\beta_k$ to control the corresponding basis function $B_k$.
+
+Given the population mean of $\beta$, denoted as $\bar \beta$, the $\beta$ specific to a certain season ($s=i$) and certain geography ($g=j$) is $\bar \beta$ plus vector $\delta_{s=i}$ and $\delta_{g=j}$. $\delta_{s=i}$ defines the deviation of the certain season from $\bar \beta$ and the certain geography from $\bar \beta$.


Usually when people use a single symbol like $\delta$ with a single subscript, it means that $\delta$ is a vector, and the value of the subscript determines which index of $\delta$ to look at.

So $\delta_s$ means "give me the $s$-th value of the vector $\delta$" and $\delta_g$ means "give me the $g$-th value of the vector $\delta$".

If you want to have different kinds of $\delta$'s, then you'll need a different kind of indexing. The simplest thing is to have two indices, and say $\delta_{0s}$ refers to the $s$-th seasonal deviation and $\delta_{1g}$ is the $g$-th geographical deviation.

I know what you mean, but it's very non-standard notation

I'm realizing that part of this is because people use lowercase letters to mean indices. It wouldn't be crazy to use uppercase, so that $\delta_{Ss} \sim \mathcal{N}(0, \sigma_S)$ and $\sigma_S = \mathrm{Exp}(40)$. It's not great, but neither is $\delta_{0s}$.

This part has been deleted as it doesn't follow hierarchical structure, per discussion

swo · 2026-02-17T16:21:57Z

-    X: ArrayLike,
-    estimate: ArrayLike,
+    data: pl.DataFrame,
+    p: int = 2,


I'm realizing that it's confusing that "p" is probability, likelihood, and also degree of the spline

swo · 2026-02-17T16:22:19Z


-    # Penalized precision matrix, add 1e-6 to make sure stability
-    precision = (lam * S) + 1e-6 * jnp.eye(p)
+    if groups is None:


I think you can drop this. We'll always have groups, even if it's just season.

The script has been deleted, per discussion

swo · 2026-02-17T16:23:37Z

+
+            z = numpyro.sample(
+                f"z_{idx}",
+                dist.MultivariateNormal(0, jnp.eye(k)),


You don't need multivariate normal here. This is a vector of values, so you can just use dist.Normal

swo · 2026-02-17T17:16:39Z


-if __name__ == "__main__":
-    ## model fitting ##
+        for idx in data["group_combo_idx"].unique():


I'm trying to follow the logic here:

Pick common $\lambda$, $\sigma_\mathrm{season}$, $\sigma_\mathrm{geo}$, and $\sigma_Z$ to be used across all states/seasons

For each state/season, fit a spline

Adjust that spline based on the common values

This means that the only way the fit for one state/season "sees" the data from other seasons is via the shared parameters ($\lambda$, $\sigma_\mathrm{season}$, $\sigma_\mathrm{geo}$, and $\sigma_Z$). The deviations, design matrix, etc. are all within each state/season.

So I'm confused first about the merit of having a prior on $\lambda$. Are we asking the data to tell us what kind of penalty we should put on wiggliness?

Second, I'm confused how this is a hierarchical model, since we're not sharing information about the states or the seasons. I expect this means that the forecasts don't look very good?

The script has been deleted, per discussion

Co-authored-by: Scott Olesen <ulp7@cdc.gov>

afmagee42

It could just be my relative inexperience with splines, or being out of date on the project at hand, but I am not sure I see how this model does what I thought we wanted it to do?

afmagee42 · 2026-02-17T19:04:53Z

 ```math

-g^{-1}(E(y)) = X\beta + \beta_0
+g^{-1}(E(y)) = X\beta


Link function framing feels to me like an easy way for us to either get trapped in frequentist thinking or backed into corners we don't want to be in.

afmagee42 · 2026-02-17T19:08:00Z

+k = m + p - 1
+$$
+
+The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$.


Yes, basis splines are the way to go when doing something like this. Bases are pre-computable, limiting the work you need to do in NumPyro

afmagee42 · 2026-02-17T19:09:28Z

 ```

-$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. The element in $X$ is the value of the basis function evaluated at the predictor elapsed, with rows are data point $x_i$ , and columns are basis function $B_k$.
+$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:


Suggested change

$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:

$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $K$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:

I think this was just a typo and big K was intended, rather than little k?

I would also not call this a definition, it's a relationship. The definition comes when you choose which of $K$, $m$, and $p$ that you fix, and which is free.

See updated!

afmagee42 · 2026-02-17T19:12:07Z

+$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:
+
+$$
+k = m + p - 1


Suggested change

k = m + p - 1

K = m + p - 1

Same typo, I think?

afmagee42 · 2026-02-17T20:35:09Z

+L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma))
 ```

 Using empirical Bayes approach, we can derive:


Empirical Bayes gets my hackles up

It seems to me that the prior of $\beta$ is derived because it looks like a prior in the equation...i.e, fits in the position where prior should be. That's my understanding about "empirical"

afmagee42 · 2026-02-17T20:38:51Z


 ```math
-L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma^2))
+L(y|\beta)\cdot exp(-\lambda\beta^TS\beta/(2\sigma))


FWIW, in my estimation in so far as MCMV has coalesced on a style, we don't use L(params) for the likelihood, we write p(data | params)

afmagee42 · 2026-02-17T20:43:37Z

+```
+

 #### Deriving prior of $\beta$


Why are we doing this? This sems like an attempt to make a prior out of a frequentist penalty function, which does not work well. When doing maximum likelihood, all you need to penalize is a point. We need to penalize a distribution. If you could just apply a Bayesian prior matching a frequentist penalty function, "the Bayesian lasso" (aka regression with exponential priors) would provide good sparsity, but it doesn't.

This is what Wood et al done to derive prior from the penalty function in section 2.4. I'm new on this topic, can you explain why it doesn't work well, is it frequentist penalty function penalizes a point while Bayesian way is to penalize a distribution, and the connection between these two is questionable?

afmagee42 · 2026-02-17T20:53:46Z

+
+```math
+\begin{align*}
+\beta_{total} &= \bar \beta + \delta_{s=i} + \delta_{g=j} \\


Are these scalars? Vectors?

What does this partial pooling structure imply about the functional forms per state-season? It is not clear to me that

There is any guarantee they have to look remotely similar

This looks anything like our general idea that functional forms are consistent give or take shifts to the start of uptake and peak uptake

Per discussion, the function forms are not shared and it's like no pooling. This part has been deleted

afmagee42 · 2026-02-17T21:01:27Z

If $f(t)$ is a spline function, and $u(t, s, y)$ is the uptake in state $s$ in year $y$, I would have expected a partially-pooled spline model to look, downstream of the spline-y bits and the priors on their coefficients, more like

$$u(t, s, y) := \theta_{s, y} \times f((t - \xi_{s, y}) / \zeta_{s, y})$$

Fuhan-Yang · 2026-02-20T04:13:15Z

Per discussion, the scripts are not truly hierarchical, and it takes much efforts to build HGAM in numpyro. The scripts have been deleted. The doc has been rephrased per comments, and the part about hierarchy in the document has been deleted.

Fuhan-Yang · 2026-02-20T19:25:13Z

Close per discussion

Fuhan-Yang added 5 commits February 11, 2026 13:35

model draft

d53f197

add group effect

7db5ec1

add implementation of group effect

c731f88

update doc

1f35129

hierarchical gam draft

c9def62

Fuhan-Yang requested a review from swo February 16, 2026 19:38

swo reviewed Feb 17, 2026

View reviewed changes

Update docs/gam.md

d99e977

Co-authored-by: Scott Olesen <ulp7@cdc.gov>

afmagee42 reviewed Feb 17, 2026

View reviewed changes

Fuhan-Yang added 3 commits February 18, 2026 09:54

edit

9a1bf94

fix

00ff7f6

delete code

1be8168

Fuhan-Yang requested review from afmagee42 and swo February 20, 2026 04:14

Fuhan-Yang mentioned this pull request Feb 20, 2026

HGAMs #275

Closed

Fuhan-Yang closed this Feb 20, 2026

swo deleted the fy_gam_hi2 branch March 23, 2026 15:38


		The group factors are season ($s$) and geography ($g$). Their effects are introduced in $\beta$, to allow varying shape of the spline function adjusted by each $\beta_k$ to control the corresponding basis function $B_k$.

		Given the population mean of $\beta$, denoted as $\bar \beta$, the $\beta$ specific to a certain season ($s=i$) and certain geography ($g=j$) is $\bar \beta$ plus vector $\delta_{s=i}$ and $\delta_{g=j}$. $\delta_{s=i}$ defines the deviation of the certain season from $\bar \beta$ and the certain geography from $\bar \beta$.

	$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $k$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:
	$y$ is a vector of observed vaccination coverage, $X$ is the design matrix of basis function with $N \times k$ dimension, where $N$ is the number of data points and $k$ is the number of basis functions used. $K$ is defined by the order degree of spline function $p$ and the number of internal knots $m$ by:

Conversation

Fuhan-Yang commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afmagee42 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afmagee42 commented Feb 17, 2026

Fuhan-Yang commented Feb 12, 2026 •

edited

Loading

Fuhan-Yang commented Feb 20, 2026 •

edited

Loading