You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Relating forecast performance to methodology: the influence of model structure and target specificity on the performance of European COVID-19 forecasts
12
+
## The influence of model structure and geographic specificity on forecast accuracy among European COVID-19 forecasts
6
13
7
-
Kath Sherratt (1), Rok Grah (2), Bastian Prasse (2), The European COVID-19 Forecast Hub, Sam Abbott (1), Sebastian Funk (1)
14
+
Katharine Sherratt (1), Rok Grah (2), Bastian Prasse (2), Friederike Becker (3), Jamie McLean (1), Sam Abbott (1), Sebastian Funk (1)
8
15
9
-
(1) London School of Hygiene & Tropical Medicine
16
+
(1) Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine
10
17
(2) European Centre for Disease Prevention and Control
18
+
(3) Institute of Statistics, Karlsruhe Institute of Technology
19
+
20
+
#### Overview
21
+
22
+
- A [slide deck](https://docs.google.com/presentation/d/1BSdTEuZ_zKdU8tBFuRMmP7GwHht1D0oZSkaFWovz9ao/edit?slide=id.p#slide=id.p) offers high level context for what we were interested in, what we did, and what we found.
23
+
24
+
#### Summary
11
25
12
-
### Abstract
26
+
- Accurately predicting the spread of infectious disease is essential to supporting public health during outbreaks.
27
+
However, comparing the accuracy of different forecasting models is challenging.
28
+
Existing evaluations struggle to isolate the impact of model design choices (like model structure or specificity to the forecast target) from the inherent difficulty of predicting complex outbreak dynamics.
29
+
Our research introduces a novel approach to address this by systematically adjusting for common factors affecting epidemiological forecasts, accounting for multi-layered and non-linear effects on predictive difficulty.
13
30
14
-
Background: The performance of forecasts in capturing observed data often varies in time and space with no overall "best" forecast. Two varying features of forecasting methods that could help explain this are: the forecaster’s approach to model structure; and whether the forecaster specifically tunes their model to each target. We investigated short-term forecasts of weekly incident deaths for 32 countries submitted to the European COVID-19 Forecast Hub between March 2021 and March 2023.
31
+
- We applied this approach to a large dataset of forecasts from 47 different models submitted to the European COVID-19 Forecast Hub.
32
+
We adjusted for variation across epidemic dynamics, forecast horizon, location, time, and model-specific effects.
33
+
This allowed us to isolate the impact of model structure and geographic specificity on predictive performance.
15
34
16
-
Methods: We categorised 39 models by their structure (mechanistic, semi-mechanistic, statistical), and by their specificity (the number of locations each team targeted; and whether the target location was the same as the modeller’s institutional location, as a proxy for model adaptation to local conditions). We assessed forecast performance using the weighted interval score. First, we compared performance relatively against a median ensemble of all models. Next, we used a generalised additive model to explore performance among different model structures and specificity, while accounting for the level and trend of observed incidence, the forecast horizon, and random variation among models.
35
+
- Our findings suggest that after adjustment, apparent differences in performance between model structures became minimal, while models that were specific to a single location showed a slight performance advantage over multi-location models.
36
+
Our work highlights the importance of considering predictive difficulty when evaluating across forecasting models, and provides a framework for more robust evaluations of infectious disease predictions.
17
37
18
-
Results: We evaluated forecasts of COVID-19 deaths up to 4 weeks ahead for up to 32 countries over 104 weeks. No one model structure consistently outperformed the ensemble. Mechanistic models saw the widest range in performance, with the range of scores including both beating the ensemble and performing up to three times worse. Models that targeted only one or two countries appeared to perform on average better and more consistently compared to multi-country models.
38
+
#### Deep dive
19
39
20
-
Interpretation: Based on this study we would recommend that mechanistic models should be part of predictive ensembles, with an emphasis on using information from the local context of where they are applied. Multi-model comparisons should encourage methodological diversity while ensuring that detailed information on methodology is collated alongside the numerical predictions.
40
+
- Read the pre-print: [medRxiv](https://doi.org/10.1101/2025.04.10.25325611)
41
+
- Current working draft: [Docs](https://docs.google.com/document/d/1OOVUHR_BGWcviSNxvaHvbXD16Bb3Y_zhw--7gAGBqMk/edit#)
offers high level context for what we were interested in, what we did,
21
+
and what we found.
10
22
11
-
### Explainer
12
-
- A [slide deck](https://docs.google.com/presentation/d/1BSdTEuZ_zKdU8tBFuRMmP7GwHht1D0oZSkaFWovz9ao/edit?slide=id.p#slide=id.p) offers a high level summary of what we were interested in, what we did, and what we found.
23
+
#### Summary
13
24
14
-
### Abstract
25
+
- Accurately predicting the spread of infectious disease is essential to
26
+
supporting public health during outbreaks. However, comparing the
27
+
accuracy of different forecasting models is challenging. Existing
28
+
evaluations struggle to isolate the impact of model design choices
29
+
(like model structure or specificity to the forecast target) from the
30
+
inherent difficulty of predicting complex outbreak dynamics. Our
31
+
research introduces a novel approach to address this by systematically
32
+
adjusting for common factors affecting epidemiological forecasts,
33
+
accounting for multi-layered and non-linear effects on predictive
34
+
difficulty.
15
35
16
-
The predictive accuracy of infectious disease forecasts varies in time, space and between models. When building models for prediction, forecasters have the choice of a range of underlying model structures with more or less ability to tune a particular model to the forecasting target. However, it has been difficult to compare the effect of these choices due to a lack of standardised forecast reporting and evaluation. Here, we used prospectively collected, standardised, open data from the European COVID-19 Forecast Hub to investigate model-specific factors that might influence forecast performance.
36
+
- We applied this approach to a large dataset of forecasts from 47
37
+
different models submitted to the European COVID-19 Forecast Hub. We
38
+
adjusted for variation across epidemic dynamics, forecast horizon,
39
+
location, time, and model-specific effects. This allowed us to isolate
40
+
the impact of model structure and geographic specificity on predictive
41
+
performance.
17
42
18
-
We evaluated 1-4 week ahead forecasts of COVID-19 cases and deaths for 32 countries between 2021 and 2023. We categorised 47 models by their structure: agent-based, mechanistic, semi-mechanistic, statistical or other; and by their specificity to a geographic location: whether a forecaster predicted outcomes for one country or many. We assessed forecast performance using the weighted interval score after log-transforming both forecasts and observations. We used a generalised additive mixed model to explore performance, additionally accounting for changes between countries over time, the epidemiological situation, the forecast horizon, and variation among models.
43
+
- Our findings suggest that after adjustment, apparent differences in
44
+
performance between model structures became minimal, while models that
45
+
were specific to a single location showed a slight performance
46
+
advantage over multi-location models. Our work highlights the
47
+
importance of considering predictive difficulty when evaluating across
48
+
forecasting models, and provides a framework for more robust
49
+
evaluations of infectious disease predictions.
19
50
20
-
We observed some small differences between model types, with statistical models slightly outperforming other types when forecasting deaths, but with widely overlapping confidence intervals. We further found that those that forecast for single countries outperformed those forecasting multiple targets, however again confidence intervals of the corresponding estimates overlapped widely.
51
+
#### Deep dive
21
52
22
-
Whilst we found no clear effects, we showed that multi-model forecasting efforts are a useful source for more generalised model-based analysis of predictive performance. Our work was limited by a small sample size of independent models. We recommend that multi-model comparisons encourage methodological diversity to enable future studies of factors that drive predictive performance, ensuring that detailed information on methodology is collated alongside the numerical predictions.
0 commit comments