standard error does not get properly sorted in result of `compare` method

The result of the `arviz.compare` function is a dataframe with models sorted by elpd. However, the standard error in this table does not follow the same order of the models: instead it seems to be determined by the order the dictionary was defined. 

Here's a Stan model to generate some samples, and a python script to reproduce the results. The Stan model simply models normally distributed data, but the standard deviation `sigma` is assumed "known" (i.e. given in the data block). When `sigma` is small, the data has a lot of outliers and we get a large standard deviation for the elpd. 

```stan
// test-stan-model.stan
data {
    int N;
    vector[N] X;
    real<lower=0> sigma;
}

parameters {
    real mu;
}

model {
    X ~ normal(mu, sigma);
}

generated quantities {
    vector[N] log_lik;
    for ( i in 1:N ) {
        log_lik[i] = normal_lpdf(X[i] | mu, sigma);
    }
}
```
In the python script, we generate some Gaussian data and fit 2 models with a good and a bad `sigma`. We then use the `arviz.compare` function to compare these models.
```py3
import arviz
import cmdstanpy
import scipy.stats as sts

sm = cmdstanpy.CmdStanModel(stan_file="test-stan-model.stan")

data = {
    "N" : 1000,
    "X" : sts.norm.rvs(0.0, 1.0, size=1000)
}

data["sigma"] = 1.0 ## good sigma
sam1 = sm.sample(data=data)
adata1 = arviz.from_cmdstanpy(sam1)

data["sigma"] = 0.1 ## bad sigma

sam2 = sm.sample(data=data)
adata2 = arviz.from_cmdstanpy(sam2)

# compare the 2 models
comp_dict = {
    "M1" : adata1,
    "M2" : adata2,
}

print(arviz.compare(comp_dict, method="bb-pseudo-bma").to_markdown())

# compare again, but now with opposite names!!
comp_dict = {
    "M1" : adata2,
    "M2" : adata1,
}

print(arviz.compare(comp_dict, method="bb-pseudo-bma").to_markdown())
```

This is the expected result (first model comparison)

|    |   rank |   elpd_loo |    p_loo |   elpd_diff |   weight |        se |     dse | warning   | scale   |
|:---|-------:|-----------:|---------:|------------:|---------:|----------:|--------:|:----------|:--------|
| M1 |      0 |   -1438.56 |  1.01548 |         0   |        1 |   23.0421 |    0    | False     | log     |
| M2 |      1 |  -50529.1  | 99.3074  |     49090.6 |        0 | 2304.15   | 2311.18 | False     | log     |

And this is the result when we swap the model names:

|    |   rank |   elpd_loo |    p_loo |   elpd_diff |   weight |        se |     dse | warning   | scale   |
|:---|-------:|-----------:|---------:|------------:|---------:|----------:|--------:|:----------|:--------|
| M2 |      0 |   -1438.56 |  1.01548 |         0   |        1 | 2387.07   |    0    | False     | log     |
| M1 |      1 |  -50529.1  | 99.3074  |     49090.6 |        0 |   23.8712 | 2311.18 | False     | log     |

The table should be identical to the first one, except with opposite model names. However, also the `se` column has the wrong order!

I'm using arviz version 0.18.0, and python version 3.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

standard error does not get properly sorted in result of `compare` method #2350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
M1	0	-1438.56	1.01548	0	1	23.0421	0	False	log
M2	1	-50529.1	99.3074	49090.6	0	2304.15	2311.18	False	log

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
M2	0	-1438.56	1.01548	0	1	2387.07	0	False	log
M1	1	-50529.1	99.3074	49090.6	0	23.8712	2311.18	False	log

Uh oh!

standard error does not get properly sorted in result of compare method #2350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

standard error does not get properly sorted in result of `compare` method #2350