CDCgov
diff --git a/‎README.md‎
Lines changed: 9 additions & 8 deletions b/‎README.md‎
Lines changed: 9 additions & 8 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 10 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 10 deletions
diff --git a/‎scripts/demo_rf.py‎
Lines changed: 0 additions & 231 deletions b/‎scripts/demo_rf.py‎
Lines changed: 0 additions & 231 deletions
diff --git a/‎scripts/plot_data.py‎
Lines changed: 31 additions & 1 deletion b/‎scripts/plot_data.py‎
Lines changed: 31 additions & 1 deletion
@@ -2,8 +2,6 @@
 
 _Formerly known as Immunization Uptake Projections, or `vcf`._
 
-This repo represents an experimental prototype for forecasting the coverage of vaccinations.
-
 ## Getting started
 
 1. Read the docs at <https://cdcgov.github.io/cfa-vaccination-coverage-forecasting>, or build them locally with `mkdocs serve`
@@ -12,21 +10,21 @@ This repo represents an experimental prototype for forecasting the coverage of v
 
 ## Vignette
 
-The vignette demonstrates a workflow using this package:
+The vignette demonstrates an analytical pipeline:
 
-1. Fit a model to coverage data from past seasons
-1. Use it to forecast future coverage data in the latest season
+1. Fit models to coverage data from past seasons
+1. Use those trained models to forecast future coverage data in the latest season
 1. Evaluate forecasts against observed values
 
 ### Data source
 
-For convenience, the raw data are tracked in this repo under `data/`, which includes the script `get_nis.py`, used to collect that data with [`nis-py-api`](https://github.com/CDCgov/nis-py-api). These are estimates of season flu vaccination coverage, tracked monthly from the 2009/2010 to 2022/2023 seasons, from the [National Immunization Survey](https://www.cdc.gov/nis/about/index.html) and [Behavioral Risk Factor Surveillance System](https://www.cdc.gov/brfss/index.html).
+The vignette uses monthly estimates of season flu vaccination coverage, from the 2009/2010 season through the 2022/2023 season, as reported by the [National Immunization Survey](https://www.cdc.gov/nis/about/index.html) and [Behavioral Risk Factor Surveillance System](https://www.cdc.gov/brfss/index.html) and cleaned using [`nis-py-api`](https://github.com/CDCgov/nis-py-api) in December 2025.
 
 ### Running the vignette
 
 1. Run the pipeline with `make`. (You can run steps in parallel with, e.g., `make -j4`.)
    - By default, `make` will use `scripts/config_vignette.yaml` for its configuration.
-   - You can use different configs by running `make CONFIG=/path/to/config.yaml`
+   - You can use different configs by running `make CONFIG=/path/to/config.yaml`.
 2. Inspect `output/vignette/`:
    - `config.yaml`: a copy of the input config
    - `data.parquet`: the preprocessed, observed data
@@ -35,7 +33,7 @@ For convenience, the raw data are tracked in this repo under `data/`, which incl
    - `plots/`: visualizations
    - `pred/`: model predictions, in Hive-partitioned parquet files
 
-### Vignette workflow
+### Analysis pipeline
 
 ```mermaid
 flowchart TB;
@@ -47,6 +45,8 @@ preprocess[/scripts/preprocess.py/];
 fit[/scripts/fit.py/];
 predict[/scripts/predict.py/];
 eval[/scripts/eval.py/];
+viz[/scripts/plot_*.py/];
+plots[/output/RUN_ID/plots/*.svg/]
 
 data/raw.parquet --> preprocess --> data --> fit --> output/RUN_ID/fits/fit_DATE.pkl --> predict --> pred;
 
@@ -56,6 +56,7 @@ pred--> eval -->scores;
 data --> viz;
 pred --> viz;
 scores --> viz;
+viz --> plots;
 ```
 
 ## Disclaimers
 
@@ -1,6 +1,6 @@
 [project]
 name = "vcf"
-version = "0.2.0"
+version = "0.2.1"
 description = ""
 authors = [
     { name = "Scott Olesen", email = "ulp7@cdc.gov" },
@@ -16,20 +16,11 @@ dependencies = [
     "vl-convert-python>=1.7.0,<2",
     "pyyaml>=6.0.2,<7",
     "pytest>=9.0.3,<10",
-    "nisapi>=1.0.3",
     "numpyro>=0.16.1,<0.17",
     "jax>=0.5.0,<0.6",
-    "typing-extensions>=4.12.2,<5",
     "scipy>=1.15.3",
 ]
 
-[project.optional-dependencies]
-gam = []
-
-[tool.uv]
-[tool.uv.sources]
-nisapi = { git = "https://github.com/CDCgov/nis-py-api" }
-
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 
@@ -45,6 +45,17 @@ def add_medians(
     value_col: str = "estimate",
     type_col: str = "type",
 ) -> pl.DataFrame:
+    """Append group medians to raw values for median-overlay plots.
+
+    Args:
+        df: Input long-format data.
+        group_by: Column used to compute medians.
+        value_col: Numeric value column. Defaults to "estimate".
+        type_col: Marker column identifying datum vs median rows.
+
+    Returns:
+        Data frame containing original points and one median per group.
+    """
     return pl.concat(
         [
             df.with_columns(pl.lit("datum").alias(type_col)).select(
@@ -58,6 +69,14 @@ def add_medians(
 
 
 def month_order(season_start_month: int) -> list[str]:
+    """Return month abbreviations ordered by the configured season start.
+
+    Args:
+        season_start_month: First month of the season as an integer from 1 to 12.
+
+    Returns:
+        List of 12 month abbreviations in seasonal order.
+    """
     return [
         calendar.month_abbr[i]
         for i in list(range(season_start_month, 12 + 1))
@@ -93,7 +112,18 @@ def gather_n(df: pl.DataFrame, n: int, col_name="_idx") -> pl.DataFrame:
 
 
 def hightlight_state(df, month, order_n, value="score_value", state_var="geography"):
-    """Return a dic with a list of states (n = order_n) that has lowest score(best) and highest score(worst) at a given month"""
+    """Select best and worst states for a month based on score values.
+
+    Args:
+        df: Score data with month and geography columns.
+        month: Month to filter on.
+        order_n: Number of states to include per group.
+        value: Score column used for ranking (lower is better).
+        state_var: Column containing state or geography labels.
+
+    Returns:
+        Dictionary with keys "best" and "worst" containing state lists.
+    """
     sorted_state = (
         df.filter(pl.col("month") == month)
         .sort(pl.col(value))