Skip to content

Commit 371de09

Browse files
authored
Cleanup (#325)
- Remove RF demo script - Update docstrings - Update README - Resolves #323 - Resolves #324
1 parent a27bc2e commit 371de09

7 files changed

Lines changed: 178 additions & 279 deletions

File tree

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
_Formerly known as Immunization Uptake Projections, or `vcf`._
44

5-
This repo represents an experimental prototype for forecasting the coverage of vaccinations.
6-
75
## Getting started
86

97
1. Read the docs at <https://cdcgov.github.io/cfa-vaccination-coverage-forecasting>, or build them locally with `mkdocs serve`
@@ -12,21 +10,21 @@ This repo represents an experimental prototype for forecasting the coverage of v
1210

1311
## Vignette
1412

15-
The vignette demonstrates a workflow using this package:
13+
The vignette demonstrates an analytical pipeline:
1614

17-
1. Fit a model to coverage data from past seasons
18-
1. Use it to forecast future coverage data in the latest season
15+
1. Fit models to coverage data from past seasons
16+
1. Use those trained models to forecast future coverage data in the latest season
1917
1. Evaluate forecasts against observed values
2018

2119
### Data source
2220

23-
For convenience, the raw data are tracked in this repo under `data/`, which includes the script `get_nis.py`, used to collect that data with [`nis-py-api`](https://github.com/CDCgov/nis-py-api). These are estimates of season flu vaccination coverage, tracked monthly from the 2009/2010 to 2022/2023 seasons, from the [National Immunization Survey](https://www.cdc.gov/nis/about/index.html) and [Behavioral Risk Factor Surveillance System](https://www.cdc.gov/brfss/index.html).
21+
The vignette uses monthly estimates of season flu vaccination coverage, from the 2009/2010 season through the 2022/2023 season, as reported by the [National Immunization Survey](https://www.cdc.gov/nis/about/index.html) and [Behavioral Risk Factor Surveillance System](https://www.cdc.gov/brfss/index.html) and cleaned using [`nis-py-api`](https://github.com/CDCgov/nis-py-api) in December 2025.
2422

2523
### Running the vignette
2624

2725
1. Run the pipeline with `make`. (You can run steps in parallel with, e.g., `make -j4`.)
2826
- By default, `make` will use `scripts/config_vignette.yaml` for its configuration.
29-
- You can use different configs by running `make CONFIG=/path/to/config.yaml`
27+
- You can use different configs by running `make CONFIG=/path/to/config.yaml`.
3028
2. Inspect `output/vignette/`:
3129
- `config.yaml`: a copy of the input config
3230
- `data.parquet`: the preprocessed, observed data
@@ -35,7 +33,7 @@ For convenience, the raw data are tracked in this repo under `data/`, which incl
3533
- `plots/`: visualizations
3634
- `pred/`: model predictions, in Hive-partitioned parquet files
3735

38-
### Vignette workflow
36+
### Analysis pipeline
3937

4038
```mermaid
4139
flowchart TB;
@@ -47,6 +45,8 @@ preprocess[/scripts/preprocess.py/];
4745
fit[/scripts/fit.py/];
4846
predict[/scripts/predict.py/];
4947
eval[/scripts/eval.py/];
48+
viz[/scripts/plot_*.py/];
49+
plots[/output/RUN_ID/plots/*.svg/]
5050
5151
data/raw.parquet --> preprocess --> data --> fit --> output/RUN_ID/fits/fit_DATE.pkl --> predict --> pred;
5252
@@ -56,6 +56,7 @@ pred--> eval -->scores;
5656
data --> viz;
5757
pred --> viz;
5858
scores --> viz;
59+
viz --> plots;
5960
```
6061

6162
## Disclaimers

pyproject.toml

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "vcf"
3-
version = "0.2.0"
3+
version = "0.2.1"
44
description = ""
55
authors = [
66
{ name = "Scott Olesen", email = "ulp7@cdc.gov" },
@@ -16,20 +16,11 @@ dependencies = [
1616
"vl-convert-python>=1.7.0,<2",
1717
"pyyaml>=6.0.2,<7",
1818
"pytest>=9.0.3,<10",
19-
"nisapi>=1.0.3",
2019
"numpyro>=0.16.1,<0.17",
2120
"jax>=0.5.0,<0.6",
22-
"typing-extensions>=4.12.2,<5",
2321
"scipy>=1.15.3",
2422
]
2523

26-
[project.optional-dependencies]
27-
gam = []
28-
29-
[tool.uv]
30-
[tool.uv.sources]
31-
nisapi = { git = "https://github.com/CDCgov/nis-py-api" }
32-
3324
[build-system]
3425
requires = ["hatchling"]
3526
build-backend = "hatchling.build"

scripts/demo_rf.py

Lines changed: 0 additions & 231 deletions
This file was deleted.

scripts/plot_data.py

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,17 @@ def add_medians(
4545
value_col: str = "estimate",
4646
type_col: str = "type",
4747
) -> pl.DataFrame:
48+
"""Append group medians to raw values for median-overlay plots.
49+
50+
Args:
51+
df: Input long-format data.
52+
group_by: Column used to compute medians.
53+
value_col: Numeric value column. Defaults to "estimate".
54+
type_col: Marker column identifying datum vs median rows.
55+
56+
Returns:
57+
Data frame containing original points and one median per group.
58+
"""
4859
return pl.concat(
4960
[
5061
df.with_columns(pl.lit("datum").alias(type_col)).select(
@@ -58,6 +69,14 @@ def add_medians(
5869

5970

6071
def month_order(season_start_month: int) -> list[str]:
72+
"""Return month abbreviations ordered by the configured season start.
73+
74+
Args:
75+
season_start_month: First month of the season as an integer from 1 to 12.
76+
77+
Returns:
78+
List of 12 month abbreviations in seasonal order.
79+
"""
6180
return [
6281
calendar.month_abbr[i]
6382
for i in list(range(season_start_month, 12 + 1))
@@ -93,7 +112,18 @@ def gather_n(df: pl.DataFrame, n: int, col_name="_idx") -> pl.DataFrame:
93112

94113

95114
def hightlight_state(df, month, order_n, value="score_value", state_var="geography"):
96-
"""Return a dic with a list of states (n = order_n) that has lowest score(best) and highest score(worst) at a given month"""
115+
"""Select best and worst states for a month based on score values.
116+
117+
Args:
118+
df: Score data with month and geography columns.
119+
month: Month to filter on.
120+
order_n: Number of states to include per group.
121+
value: Score column used for ranking (lower is better).
122+
state_var: Column containing state or geography labels.
123+
124+
Returns:
125+
Dictionary with keys "best" and "worst" containing state lists.
126+
"""
97127
sorted_state = (
98128
df.filter(pl.col("month") == month)
99129
.sort(pl.col(value))

0 commit comments

Comments
 (0)