-
Notifications
You must be signed in to change notification settings - Fork 8
Submission format
Teams are asked to provide their forecasts in a quantile-based format (even though we also accept submissions containing only point forecasts). The tabular version of the data model is a simple, long-form data format, with the following required columns: ['location', 'age_group', 'forecast_date', 'target_end_date', 'horizon', 'type', 'quantile', 'value'].
Upon submission, automated checks will run to make sure your files correspond to the technical requirements. If problems occur, hopefully informative error messages will be posted below your pull request.
You can use existing forecast files in the submissions folder as examples. The columns have to contain the following information:
A unique id for the location. Unlike in some previous projects we do not use states (Bundesländer), but AGI regions where some smaller states are merged with neighbouring larger states. We use the following labels:
-
DE- Germany (national level) -
DE-BW- Baden-Württemberg -
DE-BY- Bayern -
DE-BB-BE- Berlin and Brandenburg -
DE-HE- Hessen -
DE-MV- Mecklenburg-Vorpommern -
DE-NI-HBNiedersachsen and Bremen -
DE-NW- Nordrhein-Westfalen -
DE-RP-SLRheinland-Pfalz and Saarland -
DE-SNSachsen -
DE-STSachsen-Anhalt -
DE-SH-HH- Schleswig-Holstein and Hamburg -
DE-TH- Thüringen
One of the following to indicate the age group:
"00+" (all age groups), "00-04", "05-14", "15-34", "35-59", "60-79", "80+". Note that age groups may be defined differently for different indicators, but these are the ones currently relevant for submission.
The date on which the submitted forecast data was made available in YYYY-MM-DD format. This forecast_date should correspond and be redundant with the date in the filename. According to our current submission rhythm this is always a Thursday. Our submission system will only accept submissions marked with the current or previous date as the forecast_date.
The date corresponding to the end time of the target, in YYYY-MM-DD format. As all our targets are defined for weeks starting on Mondays and ending on Sundays, the target_end_date is always a Sunday.
Values in the horizon column must be an integer. As submissions are due on Thursdays, but reporting weeks (i.e., the week definition used in RKI's surveillance data) are from Monday through Sunday, we require a convention on how to index the weeks (see Section Targets for details). Put shortly, we denote the Sunday preceding the Thursday of submission by horizon 0 and the Sunday following the Thursday of submission by horizon 1. Sundays before and after these two are indexed by -1, -2, ... and 1, 2, ..., respectively.
Note that in previous Hubs we used a variable target which contained entries like "N day ahead inc hosp". As all information other than the horizon would be redundant and as this format has proven tedious to handle, we switched to an integer-valued column horizon`.
Either "mean" or "quantile".
A value in [0.025, 0.1, 0.25, 0.5, 0.75, 0.9, 0.975], stating which quantile is displayed in this row. If type=="mean" then NA. We encourage all groups to make available all 7 quantiles.
A numeric value representing the value of the quantile or mean prediction.
For example, if quantile is 0.3 and value is 10, then this row is saying that the 30% quantile of the predictive distribution is 10. If type is "mean" and value is 15, then this row is saying that the predictive mean from this model is 15.