Support `topmodels` #1159

vincentarelbundock · 2024-07-08T21:15:21Z

Install:

install.packages("topmodels", repos = "https://R-Forge.R-project.org")
remotes::install_github("vincentarelbundock/marginaleffects@topmodels")

Examples seem to work:

library("topmodels")
library("marginaleffects")

data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)
p <- promodel(m)

# Informative error
avg_slopes(p, type = "junk")
> Error in sanitize_type(model = model, type = type, calling_function = "slopes"): Assertion on 'type' failed: Must be element of set {'mean','variance','quantile','probability','density','loglikelihood','kurtosis','skewness'}, but is 'junk'.

# Mean
avg_slopes(p)
> 
>        Term    Contrast Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
>  difference mean(dY/dX)    0.536      0.143 3.74   <0.001 12.4 0.255  0.817
> 
> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
> Type:  mean

# Probability
avg_slopes(p, type = "probability")
> 
>        Term    Contrast Estimate Std. Error     z Pr(>|z|)    S  2.5 %  97.5 %
>  difference mean(dY/dX)   -0.141     0.0333 -4.23   <0.001 15.4 -0.206 -0.0754
> 
> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
> Type:  probability

# Probability at=3
avg_slopes(p, type = "probability", at = 3)
> 
>        Term    Contrast Estimate Std. Error     z Pr(>|z|)   S  2.5 %  97.5 %
>  difference mean(dY/dX)  -0.0605     0.0218 -2.78  0.00545 7.5 -0.103 -0.0178
> 
> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
> Type:  probability

# Effect of a change of 2 units of `difference` 
# on the probability that teams score 3 goals or less
avg_comparisons(p,
    variables = list(difference = 2),
    type = "probability",
    at = 3)
> 
>        Term Contrast Estimate Std. Error     z Pr(>|z|)   S 2.5 %  97.5 %
>  difference mean(+2)   -0.291      0.122 -2.39   0.0168 5.9 -0.53 -0.0526
> 
> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
> Type:  probability

# Predictions also work
predictions(p)
> 
>  Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
>     1.768      0.182  9.73   <0.001  71.9 1.412   2.12
>     0.866      0.120  7.19   <0.001  40.5 0.630   1.10
>     1.030      0.108  9.51   <0.001  68.8 0.817   1.24
>     1.486      0.120 12.36   <0.001 114.1 1.250   1.72
>     1.435      0.113 12.68   <0.001 120.0 1.214   1.66
> --- 118 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
>     1.324      0.103 12.84   <0.001 122.9 1.122   1.53
>     1.332      0.104 12.86   <0.001 123.2 1.129   1.53
>     1.149      0.102 11.27   <0.001  95.5 0.949   1.35
>     1.604      0.142 11.30   <0.001  96.0 1.326   1.88
>     0.954      0.114  8.38   <0.001  54.1 0.731   1.18
> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, goals, difference 
> Type:  mean

vincentarelbundock · 2024-07-08T21:23:00Z

@zeileis I have not tried too many things yet, but at first glance what we discussed seems to work well. See above for installation and examples.

zeileis · 2024-07-08T22:11:54Z

Vincent, really cool, thanks!

I did not try many things, yet, but noticed the following. While avg_slopes() gives me the same output for the original Poisson glm object m and the promodel object p, I get somewhat different uncertainty results for the predictions().

Also, the type is a bit more flexible in procast() than I currently brought out in the main help file. We allow: "quantile", "mean", "variance", "probability", "cdf", "density", "pdf", "pmf", "loglikelihood", "log_pdf", "distribution", "parameters", "kurtosis", "skewness".

Thus, instead of "probability" (for compatibility with the p functions) you can also use "cdf" (for compatibility with distributions3). Similarly for "density" vs. "pdf" and "pmf" - and "loglikelihood" vs. "log_pdf". I haven't, yet decided which to push as the "preferred" set of type labels. (Thoughts and opinions are welcome...)

More tomorrow!

vincentarelbundock · 2024-07-09T05:58:14Z

TODO:

Agree on a final list of type to allow
Do SEs match in predictions when looking at topmodels vs raw model objects?

vincentarelbundock · 2024-07-09T06:06:02Z

I haven't, yet decided which to push as the "preferred" set of type labels. (Thoughts and opinions are welcome...)

In general, I like verbosity, but in this particular case I feel that it might be best to be consistent with distributions3. Sure, there is a parallel between the first letters of "pnorm" and "probability", but the parallel is not that obvious. So it's better to be consistent with something than to invent new types altogether. But that's a weak preference. What matters most is that you stick with one convention and avoid duplication.

A lot of users are asking me to implement prediction intervals in marginaleffects. So far, that's only been possible in Bayesian models or using Conformal Prediction with the inferences() function.

Is this something that topmodels can help me do reliably and flexibly for many model types? I'm thinking of something like this, which appears to yield adequate 90% coverage in an ultra simple normal model:

library("topmodels")
library("marginaleffects")

sim <- function(...) {
    N <- 500
    dat <- data.frame(x = rnorm(N))
    dat$y <- dat$x + rnorm(N)
    idx <- sample(1:N, N / 2)
    train <- dat[idx, ]
    test <- dat[-idx, ]
    m <- glm(y ~ x, data = train)
    p <- do.call(cbind, list(
        procast(m, newdata = test, type = "mean"),
        procast(m, newdata = test, type = "quantile", at = .05),
        procast(m, newdata = test, type = "quantile", at = .95)
    ))
    p <- setNames(p, c("estimate", "conf.low", "conf.high"))
    p$truth <- test$y
    coverage <- mean(p$truth < p$conf.high & p$truth > p$conf.low)
    return(coverage)
}

mean(sapply(seq_len(100), sim))
> [1] 0.89572

support topmodels

5792c8f

vincentarelbundock mentioned this pull request Dec 5, 2025

Predictions from Discrete Choice Models #1632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `topmodels` #1159

Support `topmodels` #1159

Uh oh!

vincentarelbundock commented Jul 8, 2024 •

edited

Loading

Uh oh!

vincentarelbundock commented Jul 8, 2024

Uh oh!

zeileis commented Jul 8, 2024 •

edited

Loading

Uh oh!

vincentarelbundock commented Jul 9, 2024

Uh oh!

vincentarelbundock commented Jul 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support topmodels #1159

Are you sure you want to change the base?

Support topmodels #1159

Uh oh!

Conversation

vincentarelbundock commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentarelbundock commented Jul 8, 2024

Uh oh!

zeileis commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentarelbundock commented Jul 9, 2024

Uh oh!

vincentarelbundock commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support `topmodels` #1159

Support `topmodels` #1159

vincentarelbundock commented Jul 8, 2024 •

edited

Loading

zeileis commented Jul 8, 2024 •

edited

Loading

vincentarelbundock commented Jul 9, 2024 •

edited

Loading