Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .github/workflows/mkdocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: mkdocs
on:
pull_request:
push:
branches:
- main

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: write
pages: write
id-token: write

jobs:
build:
runs-on: ubuntu-latest
outputs:
page_artifact_id: ${{ steps.upload.outputs.artifact_id }}
steps:
- uses: actions/checkout@v5
- uses: astral-sh/setup-uv@v6
with:
enable-cache: true
- uses: actions/setup-python@v6
with:
python-version-file: ".python-version"
- run: uv sync --locked --only-group mkdocs
- run: uv run mkdocs build --strict
- uses: actions/upload-pages-artifact@v4
with:
name: github-pages
path: site
retention-days: "3"

deploy:
if: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

runs-on: ubuntu-latest
needs: build

environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}

steps:
- uses: actions/deploy-pages@v4
with:
artifact_name: github-pages
preview: false
9 changes: 8 additions & 1 deletion docs/gam.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# Overview
# GAMs

## Overview

GAM (generalized additive model) models the relation between vaccine uptake ($y_i$) and the smooth version ($f(.)$) of elapsed variable (the number of days after vaccine roll-out) ($x_i$) and the random effect introduced by season ($u_j$) with link function $g^{-1}
(.)$.

```math
g^{-1}(E(y_i)) = f(x_i) + u_j + \beta_0

Expand All @@ -25,17 +28,20 @@ $y$ is a vector of observed vaccine uptake, $X$ is the design matrix of basis fu
Because the main effect and the random effect are additive, we consider them separately for now.

### Main effect

The loglikelihood function is:

```math
Loglik(\beta, \lambda, u |y) = Loglik(y| \beta) - \lambda \beta^TS\beta
```

$S$ is called penalty matrix that is used to penalize the wiggliness of smooth function. In our case, we will use cubic spline function as the basis function, and the wiggliness of cubic spline function is measured as the integral of squared secondary derivatives of $B_k(x_{i})$, which is:

```math
S_{ij} = \int{B''_i(x)B''_j(x)dx}

```

In this way, $S$ penalizes the curvature of basis function. $\lambda$ is a smoothing parameter to control the balance between smoothness and fidelity of the data, which will be estimated along with $\beta$.

Exponentiating the loglikelhood function, we have:
Expand Down Expand Up @@ -83,6 +89,7 @@ For each smooth term, it is possible to have identifiability issue between $f(x_
\sum_i^N{f(x_i)} = 0

```

In matrix form, it is:

```math
Expand Down
10 changes: 10 additions & 0 deletions docs/javascript/katex.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
document$.subscribe(({ body }) => {
renderMathInElement(body, {
delimiters: [
{ left: "$$", right: "$$", display: true },
{ left: "$", right: "$", display: false },
{ left: "\\(", right: "\\)", display: false },
{ left: "\\[", right: "\\]", display: true },
],
});
});
19 changes: 10 additions & 9 deletions docs/model_details.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# Overview
# Model details

These are the mathematical details of the models used to capture and forecast vaccine uptake. There are currently just one model: a mixture of a logistic and linear function. This model proposes a latent true uptake curve, which is subject to observation error. A hierarchy accounts for the unique effects of grouping factors (e.g. season, geography, age) on model parameters.

# Logistic Plus Linear (LPL) Model
## Logistic Plus Linear (LPL) Model

## Notation
### Notation

The following notation will be used for the LPL model:

- $t$ = time since the start of the season, expressed as the fraction of a year elapsed
- $V_t^{obs}$ = number of people surveyed at time $t$ who are vaccinated
- $N_t^{obs}$ = total number of people surveyed at time $t$
- $c_t$ = latent true cumulative uptake on day $t$
- $G$ = grouping factors (e.g. season, geographic area, age group, race/ethnicity), indexed by $i$ with $I$ total factors

## Summary
### Summary

At a high level, the LPL model is structured as follows:

Expand All @@ -35,7 +36,7 @@ Here, $t$ is rescaled by dividing by 365, so that $t$ represents the proportion
\end{align*}
```

## Observation Layer
### Observation Layer

The observed uptake is considered a draw from the beta-binomial distribution, governed in part by the true latent uptake in the population.

Expand All @@ -47,7 +48,7 @@ The observed uptake is considered a draw from the beta-binomial distribution, go

Note that the shape parameters $\alpha$ and $\beta$ are not declared explicitly. Rather they are implied by an alternate mean and concentration parametrization, described below.

## Functional Structure
### Functional Structure

The model's functional structure describes the latent true uptake curve:

Expand All @@ -57,7 +58,7 @@ The model's functional structure describes the latent true uptake curve:
\end{align*}
```

$c_{t,G_1,...,G_I}$ serves as the mean of the beta distribution in the beta-binomial likelihood in the observation-layer. A fixed concentration parameter $d$ is also required. From the mean and concentration, the two shape parameters of the beta distribution are as follows:
$c_{t,G_1,...,G_I}$ serves as the mean of the beta distribution in the beta-binomial likelihood in the observation-layer. A fixed concentration parameter $d$ is also required. From the mean and concentration, the two shape parameters of the beta distribution are as follows:

```math
\begin{align*}
Expand All @@ -66,7 +67,7 @@ The model's functional structure describes the latent true uptake curve:
\end{align*}
```

## Hierarchical Structure
### Hierarchical Structure

Certain parameters of the latent true uptake curve have group-specific deviations, determined as follows:

Expand All @@ -79,7 +80,7 @@ Certain parameters of the latent true uptake curve have group-specific deviation

and similarly for $M$.

## Priors
### Priors

```math
\begin{align*}
Expand Down
15 changes: 8 additions & 7 deletions docs/model_journal.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,37 @@
# Overview
# Model "journal"

Choosing a model structure has been trickier than expected. This is an informal record of the attempts that have been made, their outcomes, and consequent directions.

# Cumulative S Curves
## Cumulative S Curves

The Scenarios team would like to use vaccine uptake forecasts as input for their ODE models. For this purpose, they need uptake curves that are continuously differentiable, not forecasts that are a series of point estimates. Autoregressive and stochastic models are not suitable for this purpose. Consequently, families of S-curves that directly model cumulative uptake will be prioritized, starting with the Hill function.

# Hill Function
## Hill Function

In brief, the original Hill function model considers latent true uptake to follow a Hill curve shape. The final uptake and midpoint parameters ("A" and "H", respectively) can deviate additively from their overall averages based on grouping factors (e.g. season, state), while the steepness parameter ("n") only has one overall value. Observed uptake is centered on the latent true uptake but may have error in either direction. The magnitude of this error is derived from the 95% confidence interval reported along with each point estimate in the NIS data.

## Trouble Pt. 1: Truncated Normal Observation
### Trouble Pt. 1: Truncated Normal Observation

The Hill model would not fit - the MCMC chains remained stationary and returned hundreds of identical draws with ESS = 1.0. This happened whether grouping factors were included or not (i.e. one curve fit across all seasons at the national scale). The stationary chains were solved by ignoring the empirical estimates of observation error: when observation error was fixed at a generous value (0.03) or was fit as a free parameter, the MCMC chais were no longer stationary.

Why did empirical observation error break MCMC? The original Hill model used a truncated Normal draw to describe the observation process. The reported 95% confidence intervals were assumed to be Wald intervals, such that an interval's half-width divided by 1.96 approximates the standard deviation of the truncated Normal. These standard deviations were often on the order of 0.001, implying that the observed uptake curves are very close to the latent true uptake. But the Hill function does not fit the data that well: especially in the latter half of seasons, true uptake continues creeping upward while the Hill function asymptotes. Thus, no parameter set exists that can get the Hill function close enough to all data points, and MCMC gets stuck in flat portions of the likelihood landscape.

## Solution Pt. 1: Beta-Binomial Observation
### Solution Pt. 1: Beta-Binomial Observation

MCMC chains were unstuck by reinterpretting the empirical confidence intervals in terms of the actual data collection process. Cumulative uptake is estimated by the proportion p of N phone survey participants who report being vaccinated. By considering an interval's half-width divided by 1.96 to be the standard error of the mean (SEM) for the reported uptake proportion p, N was estimated at each data point. Sensibly, stimated N is on the order of 1,000 for individual states and 50,000 at the national scale.

With estimates of pN and N in hand, the observation process was replaced with a beta-binomial likelihood, which inherently permits observations to vary farther from the latent true uptake, compared to the truncated Normal likelihood. Consequently, the MCMC chains began sampling parameter space more freely.

## Trouble Pt. 2: Hill Function Shape
### Trouble Pt. 2: Hill Function Shape

Even with MCMC proceeding, other warning signs arose:

- When grouping on season alone, season-specific deviations in both A and H from their overall averages have very wide 95% credible intervals, straddling 0. And yet, it is clear that uptake curves do differ from one another across seasons.
- When grouping on season and state, the fitting proceeds very slowly (1-2 it/s). A, A-deviations-by-season, H, and H-deviations-by-season all had very low ESS (40-60, despite 500 samples after warmup). A-deviations-by-state had even lower ESS (10-15). H-deviations-by-state had higher ESS, but the magnitude in variation in H among states was estimated very close to 0.

Together, these warning signs suggest some non-identifiability among the parameters that vary by grouping factor, perhaps again driven by the poor fit of the Hill function to uptake curves.

## Solution Pt 2: Logistic + Linear Functions
### Solution Pt 2: Logistic + Linear Functions

Many warning signs were alleviated by changing the structure of the latent true uptake from a pure Hill function to a logistic function plus a slope-only linear function (intercept = 0). In this model, the linear slope "M" and the logistic asymptote "A" can deviate additively from their overall averages by group, while the logistic midpoint "H" and steepness "n" are fixed across groups. In particular, this mixed function allows uptake to continue creeping upward late in a season.

Expand Down
59 changes: 59 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
site_name: IUP

nav:
- index.md
- analytical_plan.md
- gam.md
- model_details.md
- model_journal.md

repo_url: https://github.com/CDCgov/cfa-immunization-uptake-projection
repo_name: repo

# advanced configuration ------------------------------------------------------
theme:
name: "material"
icon:
repo: fontawesome/brands/github

plugins:
- mkdocstrings:
handlers:
python:
options:
# see <https://mkdocstrings.github.io/python/usage/> for options
show_root_heading: true
show_object_full_path: true
- search

markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.arithmatex:
generic: true
- pymdownx.superfences:
custom_fences:
# enables rendering of ```mermaid and ```math blocks
# the !! parts of this section will trip yaml checkers as "unsafe"
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- name: math
class: arithmatex
format:
!!python/object/apply:pymdownx.arithmatex.arithmatex_fenced_format {
kwds: { mode: generic, tag: div },
}

# math rendering
extra_javascript:
- javascript/katex.js
- https://unpkg.com/katex@0/dist/katex.min.js
- https://unpkg.com/katex@0/dist/contrib/auto-render.min.js

extra_css:
- https://unpkg.com/katex@0/dist/katex.min.css
10 changes: 7 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,9 @@ dependencies = [
]

[project.optional-dependencies]
gam = ["scikit-fda"]

gam = ["scikit-fda>=0.10.1"]

[tool.uv]

[tool.uv.sources]
nisapi = { git = "https://github.com/CDCgov/nis-py-api" }

Expand All @@ -46,3 +44,9 @@ dev = [
"pre-commit>=4.2.0",
"pytest>=8.4.0",
]
mkdocs = [
"mkdocs>=1.6.1",
"mkdocs-material>=9.7.1",
"mkdocstrings>=1.0.0",
"mkdocstrings-python>=2.0.1",
]
Loading