Skip to content

Commit 49f8921

Browse files
committed
order via factor, covid fcsts, ggplotly
1 parent 110c2f5 commit 49f8921

File tree

1 file changed

+83
-18
lines changed

1 file changed

+83
-18
lines changed

scripts/reports/season_summary_2025.Rmd

+83-18
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ covid_forecasts$forecaster %<>% case_match(
7979
)
8080
8181
forecast_week <- flu_scores$forecast_date %>% unique()
82-
forecast_weeks_to_plot <- c(seq.Date(min(forecast_week), max(forecast_week), by = 3*7), as.Date("2025-01-18"))
82+
forecast_weeks_to_plot <- c(seq.Date(min(forecast_week), max(forecast_week), by = 3*7), as.Date("2025-01-18"), as.Date("2025-02-01"))
8383
forecast_weeks_to_plot %in% (flu_scores$forecast_date %>% unique())
8484
forecast_weeks_to_plot %in% (covid_scores$forecast_date %>% unique())
8585
```
@@ -221,6 +221,7 @@ flu_archive <- qs2::qs_read(here::here("flu_hosp_prod", "objects", "nhsn_archive
221221
flu_current <- flu_archive %>%
222222
epix_as_of_current() %>%
223223
filter(geo_value %nin% c("as", "gu", "mp", "vi"))
224+
flu_max <- flu_current %>% group_by(geo_value) %>% summarize(max_value = max(value))
224225
compute_peak_season <- function(data_current, threshold = 0.5, start_of_year = as.Date("2024-11-01")) {
225226
season_length <- data_current %>% pull(time_value) %>% max() - start_of_year
226227
data_current %>%
@@ -430,29 +431,38 @@ scored_geo <- flu_scores %>%
430431
mean_interval_coverage_90 = round(Mean(interval_coverage_90), 2),
431432
) %>%
432433
left_join(state_census, by = join_by(geo_value == abbr)) %>%
434+
left_join(flu_max, by = "geo_value") %>%
433435
ungroup()
434436
pop_score_order <- flu_score_summary %>% arrange(pop_norm_wis) %>% pull(id)
435-
scored_geo %>%
437+
geo_plot <-
438+
scored_geo %>%
436439
mutate(forecaster = factor(forecaster, levels = pop_score_order)) %>%
437440
ggplot(aes(x = geo_value, y = pop_norm_wis, fill = pop)) +
438441
geom_col() +
439442
facet_wrap(~forecaster) +
440443
scale_y_continuous(breaks = scales::pretty_breaks(n=10), labels = scales::comma) +
441-
scale_fill_viridis_c(breaks = scales::breaks_log(n=4), labels = scales::label_log(), transform="log") +
444+
scale_fill_viridis_c(transform="log") +
442445
theme_bw() +
443446
theme(axis.text.x = element_text(angle = 90, vjust = 0.0, hjust = 0.75))
447+
448+
ggplotly(geo_plot)
444449
```
445450

446451
#### Score Histograms
447452

448453
The standard deviation is far too large to actually include it in any of the previous graphs and tables.
449-
It is routinely as large as the mean value itself.
450-
To try to represent this, in this tab we have the histogram of the wis, split by phase and forecaster.
454+
It is routinely larger than the mean WIS.
455+
To try to represent this, in this tab we have the histogram of the WIS, split by phase and forecaster.
451456
Color below represents population, with darker blue corresponding to low `geo_value` population, and yellow representing high population (this is viridis).
452457
Even after normalizing by population, there is a large variation in scale for the scores.
453458

459+
The forecasters are arranged according to mean WIS.
454460
Concentration towards the left corresponds to a better score; for example, `peak` is frequently a flatter distribution, which means most models are doing worse than they were during the `increasing` period.
455-
`climate_geo_agged` is flatter overall than `ens_ar_only`
461+
During the `peak`, very few forecasters actually have any results in the smallest bin; this implies that basically no forecasters were appreciably correct around the peak.
462+
463+
In the `peak` and `decreasing` phases, the linear model simultaneously has a longer tail and a high degree of concentration otherwise, which implies it is both generally right, but catastrophically wrong when it's off.
464+
465+
Comparing the `increasing` and `decreasing` phases across forecasters, `decreasing` tends to have a stronger concentration in the lowest two bins, but a much longer tail of large errors.
456466

457467
```{r flu_score_histogram, fig.height = 20, fig.width = 13, echo=FALSE}
458468
#, levels = exp(seq(log(min(pop)), log(max(pop)), length.out = 10))
@@ -461,6 +471,7 @@ flu_scores %>%
461471
left_join(state_census, by = join_by(geo_value == abbr)) %>%
462472
mutate(wis = wis * 1e5/pop) %>%
463473
mutate(pop = factor(pop)) %>%
474+
mutate(forecaster = factor(forecaster, levels = wis_score_order)) %>%
464475
group_by(forecaster) %>%
465476
mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, covid_flat_threshold)) %>%
466477
ggplot(aes(x = wis, color = pop, y = ifelse(after_stat(count) > 0, after_stat(count), NA))) +
@@ -481,6 +492,17 @@ We've scaled so everything is in rates per 100k so that it's easier to actually
481492
Forecasters we've produced are blue, while forecasters from other teams are red.
482493
They are ordered by `mean_wis` score, best to worst.
483494

495+
```{r}
496+
tribble(
497+
~state, ~performance, ~population,
498+
"ca", "~best", "large",
499+
"dc", "~worst", "small",
500+
"pa", "terrible", "large",
501+
"hi", "~best", "small",
502+
"tx", "good", "large"
503+
) %>% datatable()
504+
```
505+
484506
```{r flu_plot_sample_forecast, fig.height = 20, fig.width = 13, echo=FALSE}
485507
plot_geos <- c("ca", "dc", "pa", "hi", "tx")
486508
filtered_flu_forecasts <- flu_forecasts %>%
@@ -490,7 +512,7 @@ filtered_flu_forecasts <- flu_forecasts %>%
490512
flu_forecast_plt <- filtered_flu_forecasts %>%
491513
filter(forecast_date %in% forecast_weeks_to_plot) %>%
492514
mutate(forecaster = factor(forecaster, levels = wis_score_order)) %>%
493-
mutate(our_forecaster = forecaster %in% our_forecasters) %>%
515+
mutate(our_forecaster = factor(forecaster %in% our_forecasters, levels = c(TRUE, FALSE))) %>%
494516
left_join(state_census, by = join_by(geo_value == abbr)) %>%
495517
mutate(value = value * 1e5/ pop) %>%
496518
pivot_wider(names_from = quantile, values_from = value) %>%
@@ -577,6 +599,7 @@ covid_archive <- qs2::qs_read(here::here("covid_hosp_prod", "objects", "nhsn_arc
577599
covid_current <- covid_archive %>%
578600
epix_as_of_current() %>%
579601
filter(geo_value %nin% c("as", "gu", "mp", "vi"))
602+
covid_max <- covid_current %>% group_by(geo_value) %>% summarize(max_value = max(value))
580603
covid_within_max <- compute_peak_season(covid_current)
581604
```
582605

@@ -678,6 +701,7 @@ covid_score_summary <- covid_scores %>%
678701
mean_ae = round(Mean(ae_median), 2),
679702
pop_norm_ae = round(Mean(ae_median*1e5/pop), 2),
680703
geomean_ae = round(GeoMean(ae_median, min_ae), 2),
704+
mean_coverage_50 = round(Mean(interval_coverage_50), 2),
681705
mean_coverage_90 = round(Mean(interval_coverage_90), 2),
682706
n = n()
683707
) %>%
@@ -814,15 +838,15 @@ score_geo <- covid_scores %>%
814838
left_join(state_census, by = join_by(geo_value == abbr)) %>%
815839
ungroup() %>%
816840
mutate(forecaster = factor(forecaster, levels = pop_wis_order))
817-
score_geo %>% filter(mean_wis > y_limit) %>% arrange(mean_wis)
841+
818842
geo_plot <- score_geo %>%
819843
filter(forecaster != "climate_geo_agged") %>%
820844
#mutate(mean_wis = pmin(mean_wis, y_limit)) %>%
821845
ggplot(aes(x = geo_value, y = pop_norm_wis, fill = pop)) +
822846
geom_col() +
823847
facet_wrap(~forecaster) +
824848
scale_y_continuous(breaks = scales::pretty_breaks(n=10), labels = scales::comma) +
825-
scale_fill_viridis_c(breaks = scales::breaks_log(n=4), labels = scales::label_log(), transform="log") +
849+
scale_fill_viridis_c(transform="log") +
826850
theme(axis.text.x = element_text(angle = 90, vjust = 0.0, hjust = 0.75))
827851
828852
ggplotly(geo_plot)
@@ -838,24 +862,32 @@ score_geo %>%
838862
scale_fill_viridis_c(breaks = scales::breaks_log(n=4), labels = scales::label_log(), transform="log") +
839863
theme(axis.text.x = element_text(angle = 90, vjust = 0.0, hjust = 0.75))
840864
```
841-
#### Score histograms
865+
866+
#### Score Histograms
842867

843868
The standard deviation is far too large to actually include it in any of the previous graphs and tables meaningfully.
844869
It is routinely larger than the wis value itself.
845870
Like with Flu, in this tab we have the histogram of the wis, split by phase and forecaster.
846871
Color below represents population, with darker blue corresponding to low `geo_value` population, and yellow representing high population (this is viridis).
847872
Even after normalizing by population, there is a variation in scale for the scores.
848873

874+
The forecasters are ordered according to mean WIS.
849875
Concentration towards the left corresponds to a better score; for example, `peak` is frequently a flatter distribution, which means most models are doing worse than they were during the `increasing` period.
850-
`climate_geo_agged` is flatter overall than `ens_ar_only`
851876

852-
```{r, fig.height = 20, fig.width = 13, echo=FALSE}
853-
#, levels = exp(seq(log(min(pop)), log(max(pop)), length.out = 10))
877+
Like in Flu, in the `peak` phase, basically all forecasters are basically missing the first bin, so no forecasters are right during the peak.
878+
Unlike in Flu, the `flat` phase exists, and roughly resembles `decreasing` in distribution.
879+
`increasing` is overall a much smaller proportion of all samples.
880+
881+
`climate_base` is the closest any of these scores have come to normally distributed.
882+
`climate_geo_agged` is particularly bad for Covid.
883+
884+
```{r, fig.height = 23, fig.width = 13}
854885
covid_scores %>%
855886
left_join(covid_within_max, by = "geo_value") %>%
856887
left_join(state_census, by = join_by(geo_value == abbr)) %>%
857888
mutate(wis = wis * 1e5/pop) %>%
858889
mutate(pop = factor(pop)) %>%
890+
mutate(forecaster = factor(forecaster, levels = wis_score_order)) %>%
859891
group_by(forecaster) %>%
860892
mutate(phase = classify_phase(target_end_date, first_above, last_above, rel_duration, covid_flat_threshold)) %>%
861893
ggplot(aes(x = wis, color = pop, y = ifelse(after_stat(count) > 0, after_stat(count), NA))) +
@@ -876,17 +908,16 @@ We've scaled so everything is in rates per 100k so that it's easier to actually
876908
Forecasters we've produced are blue, while forecasters from other teams are red.
877909
They are ordered by `mean_wis` score, best to worst.
878910

879-
```{r covid_plot_sample_forecast, fig.height = 20, fig.width = 13, echo=FALSE}
880-
plot_geos <- c("ca", "dc", "pa", "hi", "tx")
881-
plot_geos <- c("ca", "de", "pa", "wy")
911+
```{r covid_plot_sample_forecast, fig.height = 23, fig.width = 13, echo=FALSE}
912+
plot_geos <- c("ca", "de", "pa", "nh", "tx")
882913
filtered_covid_forecasts <- covid_forecasts %>%
883914
ungroup() %>%
884915
filter(quantile %in% c(0.1, 0.5, 0.9), geo_value %in% plot_geos)
885916
886917
covid_forecast_plt <- filtered_covid_forecasts %>%
887918
filter(forecast_date %in% forecast_weeks_to_plot) %>%
888919
mutate(forecaster = factor(forecaster, levels = wis_score_order)) %>%
889-
mutate(our_forecaster = forecaster %in% our_forecasters) %>%
920+
mutate(our_forecaster = factor(forecaster %in% our_forecasters, levels = c(TRUE, FALSE))) %>%
890921
left_join(state_census, by = join_by(geo_value == abbr)) %>%
891922
mutate(value = value * 1e5/ pop) %>%
892923
pivot_wider(names_from = quantile, values_from = value) %>%
@@ -908,7 +939,41 @@ ggplotly(covid_forecast_plt)
908939
```
909940
### Results
910941

911-
Some words on covid scores
942+
`windowed_seasonal_nssp` is a clear winner regardless of the metric used.
943+
`ensemble_windowed` is nearly as good, but since it is effectively averaging `windowed_seasonal_nssp` with `windowed_seasonal` and losing accuracy as a result, it is hardly worth it.
944+
945+
The pure climate models were substantially worse for covid than for flu, at ~4.6x the best model, rather than ~2x.
946+
Given the unusual nature of the season, this is somewhat unsurprising.
947+
948+
To some degree this explains the poor performance of `CMU-TimeSeries`.
949+
You can see this by looking at the "Scores Aggregated By Forecast Date" tab, where the first 3 weeks of `CMU-TimeSeries` are significantly worse than `climate_linear`, let alone the ensemble or our best models.
950+
951+
#### Aggregated by phase
952+
953+
There are two tabs dedicated to this, one with and one without a separate `flat` phase, which labels an entire state as `flat` if the duration of the `peak` is too long.
954+
Either way, the general shape is similar to Flu, with `increasing` scores lower than `peak` scores, but higher than `decreasing` scores.
955+
All of the phases are closer together than they were in the case of Flu, with the best `peak` phase forecaster nearly better than the worst `increasing` phase forecaster.
956+
`flat` roughly resembles increasing.
957+
Even disregarding the climate models, the distribution within a phase is wider than it was in the case of Flu.
958+
`windowed_seasonal_nssp` particularly shines during the `peak` and to some degree the `decreasing` phases.
959+
960+
#### Aggregated by ahead
961+
962+
Nothing terribly surprising here, most models are ~linear in score at increasing ahead.
963+
`windowed_seasonal_nssp` is the exception, which does comparatively worse at further aheads.
964+
965+
#### Aggregated by State
966+
967+
Across all forecasters, `wy` is a particularly difficult location to forecast, while `ca` is particularly easy.
968+
Scores don't seem to correlate particularly well with the population of the state.
969+
The variation in state scores for other group's forecasters is fairly similar to our non-climate forecasters.
970+
Both climate forecasters have a different distribution of which states are correct and which are wrong, and differ greatly from each-other.
971+
972+
#### Sample Forecasts
973+
974+
The always decreasing problem is definitely not present in these forecasts.
975+
If anything, our best forecasts are *too* eager to predict an increasing value, e.g. in `tx` and `ca`.
976+
Several of our worse forecasts are clearly caused by revision behavior.
912977

913978
# Revision behavior and data substitution
914979

0 commit comments

Comments
 (0)