report disagreements (#36)

sbfnk · web-flow · commit 28fbc09f1334 · 2025-02-14T15:04:46.000Z
diff --git a/R/prep-data.R b/R/prep-data.R
@@ -9,11 +9,14 @@ classify_models <- function(file = here("data", "model-classification.csv")) {
     pivot_longer(
       -model, names_to = "classifier", values_to = "classification"
     ) |>
+    filter(!(is.na(classification) | classification == "#N/A")) |>
     group_by(model) |>
     summarise(
+      agreement = (n_distinct(classification) == 1),
       classification = names(
         sort(table(classification), decreasing = TRUE)[1]
-      ), .groups = "drop"
+      ),
+      .groups = "drop"
     ) |>
     mutate(classification = factor(
       classification,
@@ -54,7 +57,7 @@ prep_data <- function(scoring_scale = "log") {
 
   # Method type
   methods <- classify_models() |>
-    select(model, Method = classification)
+    select(model, Method = classification, agreement)
 
   # Incidence level + trend (see: R/import-data.r)
   obs <- names(scores_files) |>
diff --git a/report/results.Rmd b/report/results.Rmd
@@ -78,14 +78,16 @@ scores_over_time
 
 ```{r structures}
 structures <- scores |>
-  select(Model, Method) |>
-  distinct() |>
-  pull(Method) |>
-  table()
-structures <- structures[structures > 0]
+  select(Model, Method, agreement) |>
+  distinct()
+
+structure_count <- table(structures$Method)
+structure_count <- structure_count[structure_count > 0]
 ```
 
-We categorised `r structures[["Qualitative"]]` models that used human judgement forecasting as qualitative. We further categorised `r structures[["Statistical"]]` models as statistical, `r structures[["Semi-mechanistic"]]` as semi-mechanistic, `r structures[["Mechanistic"]]` as mechanistic and `r structures[["Agent-based"]]` as agent-based (Supplementary Table). In the volume of forecasts provided, mechanistic, semi-mechanistic, and statistical models each contributed similar numbers of forecasts with approximately one-third each. Qualitative and agent-based models provided fewer forecasts, representing only 1-2% of forecasts. 
+We categorised `r structure_count[["Qualitative"]]` models that used human judgement forecasting as qualitative. We further categorised `r structure_count[["Statistical"]]` models as statistical, `r structure_count[["Semi-mechanistic"]]` as semi-mechanistic, `r structure_count[["Mechanistic"]]` as mechanistic and `r structure_count[["Agent-based"]]` as agent-based (Supplementary Table).
+In `r sum(!structures$agreement)` (`r round(sum(!structures$agreement) / nrow(structures) * 100)`%) of models the assignment of structure there was disagreement between the researchers doing the assignment and the final designation was done as the majority of assignments with additional manual review which in all cases retained the majority decision.
+In the volume of forecasts provided, mechanistic, semi-mechanistic, and statistical models each contributed similar numbers of forecasts with approximately one-third each. Qualitative and agent-based models provided fewer forecasts, representing only 1-2% of forecasts. 
 
 On average we observed similar performance of the interval score between mechanistic and semi-mechanistic models. These performed relatively better than statistical models and worse than the qualitative and agent-based models, although in all these cases with largely overlapping variation in performance. Relative performance among modelling methods also appeared to vary over time (Figure \@ref(scores_over_time)). For example, over summer 2021 all model types saw worsening performance coinciding with the introduction of the Delta variant across Europe, but this decline was most marked among statistical models of death outcomes compared to any other model type.