Skip to content

Commit 55ddeee

Browse files
committed
initial decreasing forecasters rmd
1 parent cefa40c commit 55ddeee

File tree

1 file changed

+176
-0
lines changed

1 file changed

+176
-0
lines changed
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: "Decreasing Forecasters"
3+
author: Delphi Forecast Team
4+
date: "`Sys.date()`"
5+
output:
6+
html_document:
7+
code_folding: show
8+
toc: True
9+
# self_contained: False
10+
# lib_dir: libs
11+
params:
12+
disease: "covid"
13+
forecast_res: !r ""
14+
forecast_date: !r ""
15+
truth_data: !r ""
16+
---
17+
18+
$$\\[.4in]$$
19+
20+
```{r echo=FALSE}
21+
knitr::opts_chunk$set(
22+
fig.align = "center",
23+
message = FALSE,
24+
warning = FALSE,
25+
cache = FALSE
26+
)
27+
knitr::opts_knit$set(root.dir = here::here())
28+
ggplot2::theme_set(ggplot2::theme_bw())
29+
source(here::here("R/load_all.R"))
30+
```
31+
32+
Partially part of the retrospective from this year.
33+
For many of the direct forecasters, the forecast is strictly decreasing, even in the middle of the surge.
34+
This effect is most prominent in flu, but occurs somewhat in covid.
35+
We need to resolve the source of this.
36+
It is some combination of the data and the models used.
37+
38+
This notebook depends on having successfully run the `flu_hosp_explore` targets pipeline to handle the creation of the basic dataset.
39+
Accordingly, you need `.Renviron` to include `TAR_PROJECT=flu_hosp_prod`.
40+
```{r}
41+
tar_make(joined_archive_data)
42+
joined_archive <- tar_read(joined_archive_data)
43+
hhs_archive <- tar_read(hhs_archive) %>% as_epi_archive()
44+
```
45+
46+
To avoid running too frequently, we'll limit to a single forecast date just after the peak of the rate of growth, so that ~ everywhere is increasing.
47+
48+
```{r}
49+
hhs_archive %>% epix_as_of_current() %>% filter(time_value > "2023-10-01") %>% autoplot(hhs)
50+
hhs_gr <- hhs_archive %>%
51+
epix_as_of_current() %>%
52+
group_by(geo_value) %>%
53+
mutate(gr_hhs = growth_rate(hhs)) %>%
54+
filter(time_value > "2023-10-01")
55+
hhs_gr %>%
56+
arrange(gr_hhs) %>%
57+
drop_na() %>%
58+
slice_max(gr_hhs) %>%
59+
ungroup() %>%
60+
group_by(time_value) %>%
61+
summarize(nn = length(hhs))
62+
```
63+
64+
So the peak is ~ 11/15
65+
66+
```{r}
67+
forecast_date <- as.Date("2023-11-29")
68+
hhs_gr %>% autoplot(gr_hhs) +
69+
geom_vline(aes(xintercept = forecast_date), lty = 2) +
70+
labs(title = "growth rates")
71+
```
72+
73+
And most locations are still increasing 2 weeks later on the 29th, so we'll use that
74+
75+
# Some utility functions
76+
77+
Since we don't really need to run the full pipeline to get forecasts from a single day and forecaster, we build a couple of functions for inspecting forecasts.
78+
```{r}
79+
forecast_aheads <- function(forecaster, epi_data = hhs_forecast, aheads = 0:4 * 7) {
80+
all_forecasts <- map(aheads, \(ahead) forecaster(epi_data, ahead)) %>% list_rbind()
81+
all_forecasts
82+
}
83+
```
84+
85+
Here's a way to easily plot a subset of the forecasts, with bands at the 80% and 50% intervals (.1-.9 and .25-.75) against the finalized data.
86+
```{r}
87+
plot_forecasts <- function(all_forecasts,
88+
geo_values,
89+
data_archive = hhs_archive,
90+
earliest_truth_data = NULL) {
91+
if (is.null(earliest_truth_data)) {
92+
earliest_truth_data <- all_forecasts$forecast_date[[1]] - as.difftime(365, units = "days")
93+
}
94+
# transform the archive to something useful for comparison
95+
finalized_plotting <- data_archive %>%
96+
epix_as_of_current() %>%
97+
filter(time_value <= max(all_forecasts$target_end_date), geo_value %in% geo_values) %>%
98+
as_tibble() %>%
99+
mutate(forecast_date = time_value) %>%
100+
filter(time_value >= earliest_truth_data)
101+
all_forecasts %>% filter(geo_value %in% geo_values) %>%
102+
pivot_wider(names_from = quantile, values_from = value) %>%
103+
ggplot(aes(x = target_end_date, group = geo_value, fill = forecast_date)) +
104+
geom_ribbon(aes(ymin = `0.1`, ymax = `0.9`), alpha = 0.4) +
105+
geom_ribbon(aes(ymin = `0.25`, ymax = `0.75`), alpha = 0.6) +
106+
geom_line(aes(y = `0.5`, color = forecast_date)) +
107+
geom_line(
108+
data = finalized_plotting, aes(x = time_value, y = hhs)) +
109+
facet_wrap(~geo_value, scale = "free") +
110+
theme(legend.position = "none")
111+
}
112+
```
113+
114+
And a method to inspect whether things are increasing that isn't just the eyeball norm on a few of them.
115+
This calculates growth rates for each quantile and each location.
116+
```{r}
117+
get_growth_rates <- function(forecasts, quantiles = NULL, outlier_bound = 1e2, ...) {
118+
if (is.null(quantiles)) {
119+
quantiles <- forecasts$quantile %>% unique()
120+
}
121+
forecasts %>%
122+
group_by(geo_value, quantile) %>%
123+
filter(min(value) != max(value), quantile %in% quantiles) %>%
124+
mutate(growth = growth_rate(value, ...)) %>%
125+
filter(abs(growth) < outlier_bound)
126+
}
127+
```
128+
129+
# Establishing the problem
130+
131+
```{r}
132+
hhs_forecast <- hhs_archive %>% epix_as_of(forecast_date)
133+
all_forecasts <- forecast_aheads(\(x, ahead) scaled_pop(x, "hhs", ahead = ahead))
134+
default_geos <- c("ca", "fl", "ny", "pa", "tx")
135+
plot_forecasts(all_forecasts, default_geos)
136+
```
137+
138+
All the forecasts are going down rather than up, even though they have multiple weeks of data!
139+
More quantitatively, across all geos:
140+
```{r}
141+
basic_gr <- get_growth_rates(all_forecasts, quantiles = 0.5, method = "smooth_spline")
142+
basic_gr %>% arrange(desc(growth))
143+
```
144+
The only places where the growth rate is positive are american samoa and the US overall, both of which have unusual data trends (as because it is ~0, and the US because it is unusually large).
145+
As a histogram (each state is included 5 times, once per ahead):
146+
```{r}
147+
basic_gr %>% ggplot(aes(x = growth)) + geom_histogram(bins = 300)
148+
```
149+
## It goes away if we use very short windows
150+
If we limit to the last 3 weeks of data (so effectively just a linear extrapolation shared across geos), it goes away:
151+
```{r}
152+
hhs_forecast <- hhs_archive %>% epix_as_of(forecast_date)
153+
all_short_forecasts <- forecast_aheads(\(x, ahead) scaled_pop(x, "hhs", ahead = ahead, n_training=3))
154+
plot_forecasts(all_short_forecasts, default_geos)
155+
```
156+
157+
They're pretty jittery, but strictly decreasing they are not.
158+
And the corresponding growth rates:
159+
160+
```{r}
161+
short_gr <- get_growth_rates(all_short_forecasts, quantiles = 0.5, method = "smooth_spline")
162+
short_gr %>% arrange(growth) %>% ggplot(aes(x = growth)) + geom_histogram(bins = 300)
163+
```
164+
So on a day-over-day basis the growth rate is mostly increasing, with some strong positive outliers and some amount of decrease.
165+
166+
# Is it geo pooling?
167+
Let's see what happens if we restrict ourselves to training each geo separately.
168+
```{r}
169+
hhs_forecast <- hhs_archive %>% epix_as_of(forecast_date)
170+
all_geos <- hhs_forecast %>% distinct(geo_value) %>% pull(geo_value)
171+
hhs_forecast %>% filter(!is.na(hhs)) %>% group_by(geo_value) %>% summarize(n_points = n()) %>% arrange(n_points)
172+
all_geos_forecasts <- map(all_geos, \(geo) forecast_aheads(\(x, ahead) scaled_pop(x, "hhs", ahead = ahead), epi_data = hhs_forecast %>% filter(geo_value == geo)))
173+
all_geos_forecasts %>% list_rbind() %>% plot_forecasts(default_geos)
174+
```
175+
176+
And the phenomina is still happening

0 commit comments

Comments
 (0)