You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The standard deviation is far too large to actually include it in any of the previous graphs and tables.
449
-
It is routinely as large as the mean value itself.
450
-
To try to represent this, in this tab we have the histogram of the wis, split by phase and forecaster.
454
+
It is routinely larger than the mean WIS.
455
+
To try to represent this, in this tab we have the histogram of the WIS, split by phase and forecaster.
451
456
Color below represents population, with darker blue corresponding to low `geo_value` population, and yellow representing high population (this is viridis).
452
457
Even after normalizing by population, there is a large variation in scale for the scores.
453
458
459
+
The forecasters are arranged according to mean WIS.
454
460
Concentration towards the left corresponds to a better score; for example, `peak` is frequently a flatter distribution, which means most models are doing worse than they were during the `increasing` period.
455
-
`climate_geo_agged` is flatter overall than `ens_ar_only`
461
+
During the `peak`, very few forecasters actually have any results in the smallest bin; this implies that basically no forecasters were appreciably correct around the peak.
462
+
463
+
In the `peak` and `decreasing` phases, the linear model simultaneously has a longer tail and a high degree of concentration otherwise, which implies it is both generally right, but catastrophically wrong when it's off.
464
+
465
+
Comparing the `increasing` and `decreasing` phases across forecasters, `decreasing` tends to have a stronger concentration in the lowest two bins, but a much longer tail of large errors.
The standard deviation is far too large to actually include it in any of the previous graphs and tables meaningfully.
844
869
It is routinely larger than the wis value itself.
845
870
Like with Flu, in this tab we have the histogram of the wis, split by phase and forecaster.
846
871
Color below represents population, with darker blue corresponding to low `geo_value` population, and yellow representing high population (this is viridis).
847
872
Even after normalizing by population, there is a variation in scale for the scores.
848
873
874
+
The forecasters are ordered according to mean WIS.
849
875
Concentration towards the left corresponds to a better score; for example, `peak` is frequently a flatter distribution, which means most models are doing worse than they were during the `increasing` period.
850
-
`climate_geo_agged` is flatter overall than `ens_ar_only`
`windowed_seasonal_nssp` is a clear winner regardless of the metric used.
943
+
`ensemble_windowed` is nearly as good, but since it is effectively averaging `windowed_seasonal_nssp` with `windowed_seasonal` and losing accuracy as a result, it is hardly worth it.
944
+
945
+
The pure climate models were substantially worse for covid than for flu, at ~4.6x the best model, rather than ~2x.
946
+
Given the unusual nature of the season, this is somewhat unsurprising.
947
+
948
+
To some degree this explains the poor performance of `CMU-TimeSeries`.
949
+
You can see this by looking at the "Scores Aggregated By Forecast Date" tab, where the first 3 weeks of `CMU-TimeSeries` are significantly worse than `climate_linear`, let alone the ensemble or our best models.
950
+
951
+
#### Aggregated by phase
952
+
953
+
There are two tabs dedicated to this, one with and one without a separate `flat` phase, which labels an entire state as `flat` if the duration of the `peak` is too long.
954
+
Either way, the general shape is similar to Flu, with `increasing` scores lower than `peak` scores, but higher than `decreasing` scores.
955
+
All of the phases are closer together than they were in the case of Flu, with the best `peak` phase forecaster nearly better than the worst `increasing` phase forecaster.
956
+
`flat` roughly resembles increasing.
957
+
Even disregarding the climate models, the distribution within a phase is wider than it was in the case of Flu.
958
+
`windowed_seasonal_nssp` particularly shines during the `peak` and to some degree the `decreasing` phases.
959
+
960
+
#### Aggregated by ahead
961
+
962
+
Nothing terribly surprising here, most models are ~linear in score at increasing ahead.
963
+
`windowed_seasonal_nssp` is the exception, which does comparatively worse at further aheads.
964
+
965
+
#### Aggregated by State
966
+
967
+
Across all forecasters, `wy` is a particularly difficult location to forecast, while `ca` is particularly easy.
968
+
Scores don't seem to correlate particularly well with the population of the state.
969
+
The variation in state scores for other group's forecasters is fairly similar to our non-climate forecasters.
970
+
Both climate forecasters have a different distribution of which states are correct and which are wrong, and differ greatly from each-other.
971
+
972
+
#### Sample Forecasts
973
+
974
+
The always decreasing problem is definitely not present in these forecasts.
975
+
If anything, our best forecasts are *too* eager to predict an increasing value, e.g. in `tx` and `ca`.
976
+
Several of our worse forecasts are clearly caused by revision behavior.
0 commit comments