You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make the vignette run "out of the box":
- Update documentation
- Update and use the repo-tracked NIS data
- Add `data/get_nis.py`
- Remove old NIS data in `data/`
- Add new raw data there
- Simplify the config
- Add a `scripts/describe_data.py` (which will later supercede the
`figs_*.py` scripts)
- Update Makefile to ensure the dependencies get propagated (i.e., don't
use *directories* as targets)
- Small updates to `iup/` files for better type hinting and bug tracking
This repo contains statistical tools to predict the uptake of immunizations (primarily vaccines and boosters). The three primary steps are:
3
+
This repo contains statistical tools to predict the uptake of immunizations (primarily vaccines and boosters).
4
4
5
-
1. Import data sets on past uptake and cast them into a standardize format
6
-
2. Fit a variety of models that both capture past uptake as well as project future uptake, and
7
-
3. Evaluate model projections against realized uptake.
5
+
## Getting started
8
6
9
-
All three steps are currently under development.
7
+
1. Read the docs at <https://cdcgov.github.io/cfa-immunization-uptake-projection>, or build them locally with `mkdocs serve`
8
+
1. This project uses [`uv`](https://docs.astral.sh/uv/) for environment and dependency management. Ensure you can `uv sync`. Use the uv-managed virtual environment (e.g., by prepending `uv run`).
9
+
1. Run the [vignette](#vignette).
10
10
11
-
This approach is applicable to seasonal adult immunizations. Each year, the uptake process starts afresh, and individuals' transitions across age groups are not relevant.
11
+
## Vignette
12
12
13
-
## Data sources
13
+
The vignette demonstrates a workflow using this package:
14
14
15
-
Use <https://github.com/CDCgov/nis-py-api> for access to the NIS data.
15
+
1. Fit a model to uptake data from past seasons
16
+
1. Use it to forecast future uptake data in the latest season
17
+
1. Evaluate forecasts against observed values
16
18
17
-
## Getting started
19
+
### Data source
20
+
21
+
For convenience, the raw data are tracked in this repo under `data/`, which includes the script `get_nis.py`, used to collect that data with [`nis-py-api`](https://github.com/CDCgov/nis-py-api). These are estimates of season flu vaccine coverage, tracked monthly from the 2009/2010 to 2022/2023 seasons, from the [National Immunization Survey](https://www.cdc.gov/nis/about/index.html).
22
+
23
+
### Running the vignette
24
+
25
+
1. Copy `scripts/config_template.yaml` to `scripts/config.yaml`. This config can be modified; see the [file structure](#config-file-structure) below.
26
+
1. Run `make` to run the model fitting and forecasting pipeline. Each run of the pipeline is assigned a `RUN_ID`. When a new `RUN_ID` is given, a new subfolder will be created inside each of the above six folders to store the corresponding outputs. When an existing `RUN_ID` is given, the contents of that `RUN_ID`'s existing subfolders will be overwritten, assuming the pipeline inputs have changed since the last run. `RUN_ID` can be assigned in line 1 of the Makefile or directly in the command line `make RUN_ID=name_of_run`.
27
+
1. Inspect the `output/` subfolders:
28
+
-`settings`: a copy of the config.
29
+
-`data`: the pre-processed data.
30
+
-`fits`: the fit model object(s).
31
+
-`diagnostics`: diagnostic plots and tables for the desired model(s) and forecast date(s).
32
+
-`forecasts`: posterior predictions and forecasts.
33
+
-`scores`: evaluation scores comparing model structures and/or forecast dates.
34
+
1. Run `make viz` to open a streamlit app in web browser, which shows the individual forecast trajectories, credible intervals, and evaluation scores, with options of dimensions and filters to customize the visualization.
35
+
1. Optionally, `make clean` to remove all outputs for a particular `RUN_ID` .
36
+
37
+
### Config file structure
18
38
19
-
1. Either set up a virtual environment and install all dependencies with `uv sync` and then enter the virtual environment (with `.venv/Scripts/activate`, `.venv/bin/activate`, or similar), or else remember to prepend each of your command-line entries with `uv run` (e.g. `uv run make nis`).
20
-
2. Get a [Socrata app token](https://github.com/CDCgov/nis-py-api?tab=readme-ov-file#getting-started) and save it in `scripts/socrata_app_token.txt`.
21
-
3. Cache NIS data with `make nis`.
22
-
4. Copy the config template in `scripts/config_template.yaml` to `scripts/config.yaml` and fill in the necessary fields.
23
-
- data: specify the vaccination uptake data to use, including a de facto annual start of the disease season, filters for rows and columns to keep, and grouping factors by which to partition forecasts.
24
-
- forecast_timeframe: specify the start and the end of the forecast period and the interval between reference dates in the forecast (using the [polars string language](https://docs.pola.rs/api/python/dev/reference/expressions/api/polars.date_range.html), e.g., `7d`).
25
-
- evaluation_timeframe: specify the interval between forecast dates if multiple forecasts are desired (sharing the same end of the forecast period). This will create different forecast horizons, which can be compared with evaluation scores. If blank, no evaluation score will not be computed.
26
-
- models: specify the name of the model (refer to `iup.models`), random seed, initial values of parameters, and parameters to use NUTS kernel in MCMC run.
27
-
- scores: specify the quantile of the posterior forecasts to use for evaluation, the date(s) on which to compute absolute difference, and any additional evaluation metrics (e.g. mean squared prediction error as `mspe`).
28
-
- forecast_plots: specify the credible interval (in fractional terms) and number of randomly chosen trajectories to show on forecast plots.
29
-
- diagnostics: specify the model (refer to `iup.models`) and the range of forecast dates (i.e. a list of earliest and latest) on which to perform diagnostics, as well as the types of plots and tables to create (refer to `iup.diagnostics`).
30
-
5. Run `make all` to run the model fitting and forecasting pipeline. This will create six `output/` subfolders:
31
-
-`settings`: a copy of the config.
32
-
-`data`: the pre-processed data.
33
-
-`fits`: the fit model object(s).
34
-
-`diagnostics`: diagnostic plots and tables for the desired model(s) and forecast date(s).
35
-
-`forecasts`: posterior predictions and forecasts.
36
-
-`scores`: evaluation scores comparing model structures and/or forecast dates.
37
-
Each run of the pipeline is assigned a `RUN_ID`. When a new `RUN_ID` is given, a new subfolder will be created inside each of the above six folders to store the corresponding outputs. When an existing `RUN_ID` is given, the contents of that `RUN_ID`'s existing subfolders will be overwritten, assuming the pipeline inputs have changed since the last run. `RUN_ID` can be assigned in line 1 of the Makefile or directly in the command line `make all RUN_ID=name_of_run`.
38
-
6. Run `make viz` to open a streamlit app in web browser, which shows the individual forecast trajectories, credible intervals, and evaluation scores, with options of dimensions and filters to customize the visualization.
39
-
7. Run `make clean` to remove all outputs for a particular `RUN_ID` and `make delete_nis` to delete the NIS data from the cache.
40
-
41
-
#### Package workflow:
39
+
- data: specify the vaccination uptake data to use, including a de facto annual start of the disease season, filters for rows and columns to keep, and grouping factors by which to partition forecasts.
40
+
- forecast_timeframe: specify the start and the end of the forecast period and the interval between reference dates in the forecast (using the [polars string language](https://docs.pola.rs/api/python/dev/reference/expressions/api/polars.date_range.html), e.g., `7d`).
41
+
- evaluation_timeframe: specify the interval between forecast dates if multiple forecasts are desired (sharing the same end of the forecast period). This will create different forecast horizons, which can be compared with evaluation scores. If blank, no evaluation score will not be computed.
42
+
- models: specify the name of the model (refer to `iup.models`), random seed, initial values of parameters, and parameters to use NUTS kernel in MCMC run.
43
+
- scores: specify the quantile of the posterior forecasts to use for evaluation, the date(s) on which to compute absolute difference, and any additional evaluation metrics (e.g. mean squared prediction error as `mspe`).
44
+
- forecast_plots: specify the credible interval (in fractional terms) and number of randomly chosen trajectories to show on forecast plots.
45
+
- diagnostics: specify the model (refer to `iup.models`) and the range of forecast dates (i.e. a list of earliest and latest) on which to perform diagnostics, as well as the types of plots and tables to create (refer to `iup.diagnostics`).
This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
132
137
133
-
## Public Domain Standard Notice
138
+
###Public Domain Standard Notice
134
139
135
-
This repository constitutes a work of the United States Government and is not
136
-
subject to domestic copyright protection under 17 USC § 105. This repository is in
137
-
the public domain within the United States, and copyright and related rights in
138
-
the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/).
139
-
All contributions to this repository will be released under the CC0 dedication. By
140
-
submitting a pull request you are agreeing to comply with this waiver of
141
-
copyright interest.
140
+
This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/). All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.
142
141
143
-
## License Standard Notice
142
+
###License Standard Notice
144
143
145
144
This repository is licensed under ASL v2 or later.
146
145
147
-
This source code in this repository is free: you can redistribute it and/or modify it under
148
-
the terms of the Apache Software License version 2, or (at your option) any
149
-
later version.
146
+
This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.
150
147
151
-
This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY
152
-
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
153
-
PARTICULAR PURPOSE. See the Apache Software License for more details.
148
+
This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.
154
149
155
-
You should have received a copy of the Apache Software License along with this
156
-
program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html
150
+
You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html
157
151
158
152
The source code forked from other open source projects will inherit its license.
159
153
160
-
## Privacy Standard Notice
154
+
###Privacy Standard Notice
161
155
162
-
This repository contains only non-sensitive, publicly available data and
163
-
information. All material and community participation is covered by the
and [Code of Conduct](https://github.com/CDCgov/template/blob/master/code-of-conduct.md).
166
-
For more information about CDC's privacy policy, please visit [http://www.cdc.gov/other/privacy.html](https://www.cdc.gov/other/privacy.html).
156
+
This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the [Disclaimer](https://github.com/CDCgov/template/blob/master/DISCLAIMER.md) and [Code of Conduct](https://github.com/CDCgov/template/blob/master/code-of-conduct.md). For more information about CDC's privacy policy, please visit [http://www.cdc.gov/other/privacy.html](https://www.cdc.gov/other/privacy.html).
167
157
168
-
## Contributing Standard Notice
158
+
###Contributing Standard Notice
169
159
170
-
Anyone is encouraged to contribute to the repository by [forking](https://help.github.com/articles/fork-a-repo)
171
-
and submitting a pull request. (If you are new to GitHub, you might start with a
172
-
[basic tutorial](https://help.github.com/articles/set-up-git).) By contributing
173
-
to this project, you grant a world-wide, royalty-free, perpetual, irrevocable,
174
-
non-exclusive, transferable license to all users under the terms of the
175
-
[Apache Software License v2](http://www.apache.org/licenses/LICENSE-2.0.html) or
176
-
later.
160
+
Anyone is encouraged to contribute to the repository by [forking](https://help.github.com/articles/fork-a-repo) and submitting a pull request. (If you are new to GitHub, you might start with a [basic tutorial](https://help.github.com/articles/set-up-git).) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the [Apache Software License v2](http://www.apache.org/licenses/LICENSE-2.0.html) or later.
177
161
178
-
All comments, messages, pull requests, and other submissions received through
179
-
CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at [http://www.cdc.gov/other/privacy.html](http://www.cdc.gov/other/privacy.html).
162
+
All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at [http://www.cdc.gov/other/privacy.html](http://www.cdc.gov/other/privacy.html).
180
163
181
-
## Records Management Standard Notice
164
+
###Records Management Standard Notice
182
165
183
-
This repository is not a source of government records but is a copy to increase
184
-
collaboration and collaborative potential. All government records will be
185
-
published through the [CDC web site](http://www.cdc.gov).
166
+
This repository is not a source of government records but is a copy to increase collaboration and collaborative potential. All government records will be published through the [CDC web site](http://www.cdc.gov).
0 commit comments