Skip to content

Commit 2eba62d

Browse files
authored
Weighted scores (#4)
* add template for twcrps * add crps for logistic distribution * add numba functionality to compute outcome-weighted crps * tidy documentation of owcrps_ensemble * add numba functionality to compute threshold-weighted crps * add equations to documentation of weighted crps * add axis argument to owcrps_ensemble and twcrps_ensemble numba functions * add vertically re-scaled crps numba functionality * add vrcrps_ensemble to scoringrules init * add weighted crps for api backends * add weighted energy scores numba functionality * add weighted variogram scores numba functionality * add api functionality for threshold-weighted energy and variogram scores * add markdown files for weighted scoring rule documentation * change gufuncs to avoid numba warnings in weighted scores * change indicator function latex code in weighted crps docstrings * add tests for weighted crps * add tests for vertically re-scaled crps * add tests for weighted energy score * add tests for weighted variogram scores * add documentation for energy and variogram scores, and weighted versions * add api functionality to compute outcome-weighted and vertically re-scaled energy score * add api functionality to compute outcome-weighted and vertically re-scaled variogram scores * fix bug with weight function in outcome weighted and vertically rescaled crps * change order of dimension inputs in variogram score to match energy score * fix bugs in weighted multivariate scores with numpy backend
1 parent a3cc01f commit 2eba62d

25 files changed

+2002
-76
lines changed

docs/api/crps/weighted.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
::: scoringrules.owcrps_ensemble
2+
::: scoringrules.twcrps_ensemble
3+
::: scoringrules.vrcrps_ensemble

docs/api/energy.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/api/energy/ensemble.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: scoringrules.energy_score

docs/api/energy/index.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Energy Score
2+
3+
The energy score (ES) is a scoring rule for evaluating multivariate probabilistic forecasts.
4+
It is defined as
5+
6+
$$\text{ES}(F, \mathbf{y})= \mathbb{E} \| \mathbf{X} - \mathbf{y} \| - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \|, $$
7+
8+
where $\mathbf{y} \in \mathbb{R}^{d}$ is the multivariate observation ($d > 1$), and
9+
$\mathbf{X}$ and $\mathbf{X}^{\prime}$ are independent random variables that follow the
10+
multivariate forecast distribution $F$ (Gneiting and Raftery, 2007)[@gneiting_strictly_2007].
11+
If the dimension $d$ were equal to one, the energy score would reduce to the continuous ranked probability score (CRPS).
12+
13+
<br/><br/>
14+
15+
## Ensemble forecasts
16+
17+
While multivariate probabilistic forecasts could belong to a parametric family of
18+
distributions, such as a multivariate normal distribution, it is more common in practice
19+
that these forecasts are ensemble forecasts; that is, the forecast is comprised of a
20+
predictive sample $\mathbf{x}_{1}, \dots, \mathbf{x}_{M}$,
21+
where each ensemble member $\mathbf{x}_{1}, \dots, \mathbf{x}_{M} \in \R^{d}$.
22+
23+
In this case, the expectations in the definition of the energy score can be replaced by
24+
sample means over the ensemble members, yielding the following representation of the energy
25+
score when evaluating an ensemble forecast $F_{ens}$ with $M$ members:
26+
27+
$$\text{ES}(F_{ens}, \mathbf{y})= \frac{1}{M} \sum_{m=1}^{M} \| \mathbf{x}_{m} - \mathbf{y} \| - \frac{1}{2 M^{2}} \sum_{m=1}^{M} \sum_{j=1}^{M} \| \mathbf{x}_{m} - \mathbf{x}_{j} \|. $$
28+
29+
<br/><br/>
30+
31+
## Weighted energy scores
32+
33+
The energy score provides a measure of overall forecast performance. However, it is often
34+
the case that certain outcomes are of more interest than others, making it desirable to
35+
assign more weight to these outcomes when evaluating forecast performance. This can be
36+
achieved using weighted scoring rules. Weighted scoring rules typically introduce a
37+
weight function into conventional scoring rules, and users can choose the weight function
38+
depending on what outcomes they want to emphasise. Allen et al. (2022)[@allen2022evaluating]
39+
discuss three weighted versions of the energy score. These are all available in `scoringrules`.
40+
41+
Firstly, the outcome-weighted energy score (originally introduced by Holzmann and Klar (2014)[@holzmann2017focusing])
42+
is defined as
43+
44+
$$\text{owES}(F, \mathbf{y}; w)= \frac{1}{\bar{w}} \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) - \frac{1}{2 \bar{w}^{2}} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime})w(\mathbf{y}), $$
45+
46+
where $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to
47+
target particular multivariate outcomes, and $\bar{w} = \mathbb{E}[w(X)]$.
48+
As before, $\mathbf{X}, \mathbf{X}^{\prime} \sim F$ are independent.
49+
50+
Secondly, Allen et al. (2022) introduced the threshold-weighted energy score as
51+
52+
$$\text{twES}(F, \mathbf{y}; v)= \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{y}) \| - \frac{1}{2} \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{X}^{\prime}) \|, $$
53+
54+
where $v : \mathbb{R}^{d} \to \mathbb{R}^{d}$ is a so-called chaining function.
55+
The threshold-weighted energy score transforms the forecasts and observations according
56+
to the chaining function $v$, prior to calculating the unweighted energy score. Choosing
57+
a chaining function is generally more difficult than choosing a weight function when
58+
emphasising particular outcomes.
59+
60+
As an alternative, the vertically re-scaled energy score is defined as
61+
62+
$$\text{vrES}(F, \mathbf{y}; w, \mathbf{x}_{0})= \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime}) + \left( \mathbb{E} \| \mathbf{X} - \mathbf{x}_{0} \| w(\mathbf{X}) - \| \mathbf{y} - \mathbf{x}_{0} \| w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), $$
63+
64+
where $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to
65+
target particular multivariate outcomes, and $\mathbf{x}_{0} \in \mathbb{R}^{d}$. Typically,
66+
$\mathbf{x}_{0}$ is chosen to be zero.
67+
68+
Each of these weighted energy scores targets particular outcomes in a different way.
69+
Further details regarding the differences between these scoring rules, as well as choices
70+
for the weight and chaining functions, can be found in Allen et al. (2022). The weighted
71+
energy scores can easily be computed for ensemble forecasts by
72+
replacing the expectations with sample means over the ensemble members.
73+
74+
<br/><br/>

docs/api/energy/weighted.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
::: scoringrules.owenergy_score
2+
::: scoringrules.twenergy_score
3+
::: scoringrules.vrenergy_score

docs/api/variogram.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/api/variogram/ensemble.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: scoringrules.variogram_score

docs/api/variogram/index.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Variogram Score
2+
3+
The varigoram score (VS) is a scoring rule for evaluating multivariate probabilistic forecasts.
4+
It is defined as
5+
6+
$$\text{VS}_{p}(F, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | X_{i} - X_{j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}, $$
7+
8+
where $p > 0$, $\mathbf{y} = (y_{1}, \dots, y_{d}) \in \mathbb{R}^{d}$ is the multivariate observation ($d > 1$), and
9+
$\mathbf{X} = (X_{1}, \dots, X_{d})$ is a random vector that follows the
10+
multivariate forecast distribution $F$ (Scheuerer and Hamill, 2015)[@scheuerer_variogram-based_2015].
11+
The exponent $p$ is typically chosen to be 0.5 or 1.
12+
13+
The variogram score is less sensitive to marginal forecast performance than the energy score,
14+
and Scheuerer and Hamill (2015) argue that it should therefore be more sensitive to errors in the
15+
forecast's dependence structure.
16+
17+
<br/><br/>
18+
19+
## Ensemble forecasts
20+
21+
While multivariate probabilistic forecasts could belong to a parametric family of
22+
distributions, such as a multivariate normal distribution, it is more common in practice
23+
that these forecasts are ensemble forecasts; that is, the forecast is comprised of a
24+
predictive sample $\mathbf{x}_{1}, \dots, \mathbf{x}_{M}$,
25+
where each ensemble member $\mathbf{x}_{i} = (x_{i, 1}, \dots, x_{i, d}) \in \R^{d}$ for
26+
$i = 1, \dots, M$.
27+
28+
In this case, the expectation in the definition of the variogram score can be replaced by
29+
a sample mean over the ensemble members, yielding the following representation of the variogram
30+
score when evaluating an ensemble forecast $F_{ens}$ with $M$ members:
31+
32+
$$\text{VS}_{p}(F_{ens}, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \frac{1}{M} \sum_{m=1}^{M} | x_{m,i} - x_{m,j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}. $$
33+
34+
<br/><br/>
35+
36+
## Weighted variogram scores
37+
38+
It is often the case that certain outcomes are of more interest than others when evaluating
39+
forecast performance. These outcomes can be emphasised by employing weighted scoring rules.
40+
Weighted scoring rules typically introduce a weight function into conventional scoring rules,
41+
and users can choose the weight function depending on what outcomes they want to emphasise.
42+
Allen et al. (2022)[@allen2022evaluating] introduced three weighted versions of the variogram score.
43+
These are all available in `scoringrules`.
44+
45+
Firstly, the outcome-weighted variogram score (see also Holzmann and Klar (2014)[@holzmann2017focusing])
46+
is defined as
47+
48+
$$\text{owVS}_{p}(F, \mathbf{y}; w) = \frac{1}{\bar{w}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2 \bar{w}^{2}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) w(\mathbf{y}) ], $$
49+
50+
where
51+
52+
$$ \rho_{p}(\mathbf{x}, \mathbf{z}) = \sum_{i=1}^{d} \sum_{j=1}^{d} \left( |x_{i} - x_{j}|^{p} - |z_{i} - z_{j}|^{p} \right)^{2}, $$
53+
54+
for $\mathbf{x} = (x_{1}, \dots, x_{d}) \in \mathbb{R}^{d}$ and $\mathbf{z} = (z_{1}, \dots, z_{d}) \in \mathbb{R}^{d}$.
55+
56+
Here, $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to
57+
target particular multivariate outcomes, and $\bar{w} = \mathbb{E}[w(X)]$.
58+
As before, $\mathbf{X}, \mathbf{X}^{\prime} \sim F$ are independent.
59+
60+
Secondly, Allen et al. (2022) introduced the threshold-weighted variogram score as
61+
62+
$$\text{twVS}_{p}(F, \mathbf{y}; v)= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | v(\mathbf{X})_{i} - v(\mathbf{X})_{j} |^{p} - | v(\mathbf{y})_{i} - v(\mathbf{y})_{j} |^{p} \right)^{2}, $$
63+
64+
where $v : \mathbb{R}^{d} \to \mathbb{R}^{d}$ is a so-called chaining function, so that
65+
$v(\mathbf{X}) = (v(\mathbf{X})_{1}, \dots, v(\mathbf{X})_{d}) \in \mathbb{R}^{d}$.
66+
The threshold-weighted variogram score transforms the forecasts and observations according
67+
to the chaining function $v$, prior to calculating the unweighted variogram score. Choosing
68+
a chaining function is generally more difficult than choosing a weight function when
69+
emphasising particular outcomes.
70+
71+
As an alternative, the vertically re-scaled variogram score is defined as
72+
73+
$$\text{vrVS}_{p}(F, \mathbf{y}; w) = \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) ] + \left( \mathbb{E} [ \rho_{p} ( \mathbf{X}, \mathbf{x}_{0} ) w(\mathbf{X}) ] - \rho_{p} ( \mathbf{y}, \mathbf{x}_{0}) w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), $$
74+
75+
where $w$ and $\rho_{p}$ are as defined above, and $\mathbf{x}_{0} \in \mathbb{R}^{d}$.
76+
Typically, $\mathbf{x}_{0}$ is chosen to be the zero vector.
77+
78+
Each of these weighted variogram scores targets particular outcomes in a different way.
79+
Further details regarding the differences between these scoring rules, as well as choices
80+
for the weight and chaining functions, can be found in Allen et al. (2022). The weighted
81+
variogram scores can easily be computed for ensemble forecasts by
82+
replacing the expectations with sample means over the ensemble members.
83+
84+
<br/><br/>

docs/api/variogram/weighted.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
::: scoringrules.owvariogram_score
2+
::: scoringrules.twvariogram_score
3+
::: scoringrules.vrvariogram_score

0 commit comments

Comments
 (0)