End-of-season absolute difference by forecast date by Fuhan-Yang · Pull Request #269 · CDCgov/cfa-vaccination-coverage-forecasting

Fuhan-Yang · 2026-02-06T16:49:57Z

A draft to visualize how the end-of-season absolute difference (eos abs diff) by forecast date:

Summary of eos abs diff for all states:

The summary box plot for the eos abs diff for all states is kind of expected: it performs the worst at season start and gets better (notice the drop at September, maybe that indicates the curve reaches plateau?)

eos abs diff for individual state:

By dissecting how each state contribute to the summary plot, we can tell some states (the first three rows) have good performance that their eos abs diff are pretty low at season start and is pretty consistent across the season. The latter three rows have the similar pattern of performing bad and getting better. @swo

swo · 2026-02-06T18:55:56Z

I think some of these plots (or things similar to it) are produced inhttps://github.com/CDCgov/cfa-immunization-uptake-projection/blob/main/scripts/plot_preds.py as scores_increasing.svg

And there are some other plots that show scores by state & season

Fuhan-Yang · 2026-02-10T16:54:41Z

Yes there are! In addition to the overall MSPE by state, it is useful to see how the forecast changes over time for each state. While plotting that encounter challenges: when handling the forecasts made for all states across the monthly forecast dates for the entire season, ~200 million rows of data is generated. Python crashes when collecting this dataset. Here are some refactoring to allow plotting specific state on a specific season from such huge dataset:

Filter the data first, then calculate the predicted median, lci, uci, finally collect.
Separate plotting score from plot_preds.py. This reduces runtime to plot the score if plotting prediction is not the target, also makes the scope of the scripts clear.

swo

See #271 for my suggestion on how to do the pipeline around the plots
I'm still confused why you need eos_abs_diff_by_state.svg when you have scores_increasing.svg. There are showing the same thing?
I'm similarly confused about eos_abs_diff_summary.svg. We have only ~50 states so it's probably better to show how each state's score improves over time, rather than summarize over them, to produce the "cone" of scores in eos_abs_diff_summary.svg, which obscures whether, for example, each state has a steadily improving score, or whether they individually bounce around.
I'm curious about this problem with .collect(). It's not a bad practice to use lazy data frames, but the data here aren't that large. In the vignette, the data files are ~1 Mb. This makes me wonder if there's something else going on.

Fuhan-Yang · 2026-02-17T15:23:43Z

See #271 for my suggestion on how to do the pipeline around the plots

The change looks good to me!

I'm still confused why you need `eos_abs_diff_by_state.svg` when you have `scores_increasing.svg`. There are showing the same thing?

eos_abs_diff_by_state.svg can better visualize how the score from individual state changes over time, instead of mixing all the states together.

As above, when there are more states than in vignette, it's hard to differ the score from which state is decreasing, or staying stable across time.

* I'm similarly confused about `eos_abs_diff_summary.svg`. We have only ~50 states so it's probably better to show how each state's score improves over time, rather than summarize over them, to produce the "cone" of scores in `eos_abs_diff_summary.svg`, which obscures whether, for example, each state has a steadily improving score, or whether they individually bounce around.

Sounds fair to me.

* I'm curious about this problem with `.collect()`. It's not a bad practice to use lazy data frames, but the data here aren't that large. In the vignette, the data files are ~1 Mb. This makes me wonder if there's something else going on.

If running all the states, the file by each forecast date is ~170 Mb. It's unavoidably huge when combining files from all forecast dates together (10 files).

swo · 2026-02-18T21:45:00Z

As above, when there are more states than in vignette, it's hard to differ the score from which state is decreasing, or staying stable across time.

I actually like this plot better! I can see that most states have improving scores from Aug to Oct, then weirdly a lot of them get a little worse in Nov, and then slowly improve through April. And there are a few exceptions (e.g., North Dakota). It would probably be better to highlight the ~3 states with the worst scores (i.e., ND and 2 others) and look at those.

In the other plot, I can't see individual trajectories, so you'd never be able to tell what was going on with ND, say

If running all the states, the file by each forecast date is ~170 Mb. It's unavoidably huge when combining files from all forecast dates together (10 files).

10 x 170 MB ~= 2 Gb, which isn't that bad. What does the error look like?

(I know that altair can't handle plotting many points, with the default engine, but that's a different thing, as I understand.)

Fuhan-Yang · 2026-02-23T15:10:25Z

I actually like this plot better! I can see that most states have improving scores from Aug to Oct, then weirdly a lot of them get a little worse in Nov, and then slowly improve through April. And there are a few exceptions (e.g., North Dakota). It would probably be better to highlight the ~3 states with the worst scores (i.e., ND and 2 others) and look at those.

How about code-coding the scores by state? Then we can really tell how individual state behaves! Highlighting some states sounds good too, we can highlight 3 states with the best and the worst scores.

In the other plot, I can't see individual trajectories, so you'd never be able to tell what was going on with ND, say

Fair, keeping

What does the error look like?

Terminal shows the target file is Killed. If the target is output/full/plots/score_by_geo.svg, it shows "make: *** [Makefile:40: output/full/plots/score_by_geo.svg] Killed".
When running in Jupyter Notebook, it shows "The Kernel crashed while executing code in the current cell or a previous cell. These errors are resolved if running under this branch.
You can reproduce the error running the code under main. Setting config_full.yaml to run all the states, and forecast dates to be between 2021-07-01 and 2022-04-01 by 1 month and changing the $(CONFIG) to this one in Makefile will do!

swo · 2026-02-23T21:48:15Z

My guess is that this error came from the plotting, not from loading the data. This PR also has changes to the plots, which I suspect might have solved the problem too.

What happens if you try to load the data into memory? It doesn't matter that much for this particular repository, but it's a very bad thing (and surprising) if sometimes polars crashes when trying to load 2 Gb of data into memory.

Fuhan-Yang · 2026-02-24T14:43:28Z

The error appears when the lazy data frame gets collect() before plotting. When adding print(pred_cones) at line 128 in plot_preds.py under main, this error appeared and pred_cones is not printed, which happens before plotting.

If the error comes from plotting, altair will suggest to use vegafusion along with something like "MaxRowLimit", instead of "Killed".

swo

Returning review gated on some changes to figures.

Ping me again when it's time!

Fuhan-Yang · 2026-02-25T15:48:26Z

Added a list of user-defined number of states, with the best (lowest eos abs diff) and the worst (lowest eos abs diff) at a given month. This highlights different representative trajectories of eos abs diff across states:

For example, here the states with lowest eos abs diff at season start (July) are highlighted in blue, and the states with highest eos abs diff at season start (July) are highlighted in red.

- Revert to `config_vignette.yaml`: we should always use this, so that new users will be running the vignette. We should run `make CONFIG=scripts/whatever.yaml` when we want to do something different - Use a checkpoint flag as a target for the plotting scripts. This avoids having to keep the plot names consistent between the scripts and the Makefile, and it allows for an approach like we're using in `plot_preds.py` that has dynamically-named files.

Fuhan-Yang requested a review from swo February 10, 2026 16:54

Fuhan-Yang marked this pull request as ready for review February 10, 2026 16:55

Fuhan-Yang force-pushed the fy_retro_lpl branch from 661729e to 4dcc7ce Compare February 11, 2026 18:36

swo force-pushed the fy_retro_lpl branch from 064ec4e to 844b4cb Compare February 12, 2026 19:23

swo requested changes Feb 13, 2026

View reviewed changes

Fuhan-Yang requested a review from swo February 17, 2026 15:27

swo reviewed Feb 24, 2026

View reviewed changes

Fuhan-Yang requested a review from swo February 25, 2026 15:48

Fuhan-Yang and others added 13 commits March 9, 2026 12:20

change config

98f239d

add summary plot of eos abs diff

e8ec1d3

eos abs diff for each state

97b0d40

sort by state

4b1e09c

allow reading large dataset

28c475e

add vegafusion for plotting large dataset

5b88edd

single script for score plotting

7cdb767

makefile update

4f03b73

fix config bug

79c871b

delete summary of eos abs diff

a951351

add highlight on eos abs diff figure

09672c3

add highlight function

b34244f

swo force-pushed the fy_retro_lpl branch from 9aebbe8 to b34244f Compare March 9, 2026 16:20

swo merged commit 65a1d62 into main Mar 9, 2026
3 checks passed

swo deleted the fy_retro_lpl branch March 9, 2026 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-of-season absolute difference by forecast date#269

End-of-season absolute difference by forecast date#269
swo merged 13 commits into
mainfrom
fy_retro_lpl

Fuhan-Yang commented Feb 6, 2026

Uh oh!

swo commented Feb 6, 2026

Uh oh!

Fuhan-Yang commented Feb 10, 2026 •

edited

Loading

Uh oh!

swo left a comment

Uh oh!

Fuhan-Yang commented Feb 17, 2026

Uh oh!

swo commented Feb 18, 2026

Uh oh!

Fuhan-Yang commented Feb 23, 2026 •

edited

Loading

Uh oh!

swo commented Feb 23, 2026

Uh oh!

Fuhan-Yang commented Feb 24, 2026

Uh oh!

swo left a comment

Uh oh!

Fuhan-Yang commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Fuhan-Yang commented Feb 6, 2026

Uh oh!

swo commented Feb 6, 2026

Uh oh!

Fuhan-Yang commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swo left a comment

Choose a reason for hiding this comment

Uh oh!

Fuhan-Yang commented Feb 17, 2026

Uh oh!

swo commented Feb 18, 2026

Uh oh!

Fuhan-Yang commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swo commented Feb 23, 2026

Uh oh!

Fuhan-Yang commented Feb 24, 2026

Uh oh!

swo left a comment

Choose a reason for hiding this comment

Uh oh!

Fuhan-Yang commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fuhan-Yang commented Feb 10, 2026 •

edited

Loading

Fuhan-Yang commented Feb 23, 2026 •

edited

Loading