Skip to content

TreeSHAP for conditional expectations of variable subsets  #262

@samkodes

Description

@samkodes

Hi all,

I am intrigued by the experimental result of Aas, Jullum, and Loland (https://arxiv.org/abs/1903.10464) that TreeSHAP fails to capture covariate dependence in any meaningful way. Do you have any insight into why this may be?

I ask because the conditional dependence estimation procedures shapr implements, particularly the empirical method, seem very similar to the adaptive nearest-neighbour interpretation of random forests, e.g. the causal forests used by the grf package (Athey and Wager https://arxiv.org/abs/1902.07409). A TreeSHAP-like algorithm might be an effective way of calculating the conditional expectation for a subset of variables using the adaptive neighbourhoods already learned by an underlying random forest model.

I wonder if part of the reason TreeSHAP failed the tests is that it was run on a boosted model rather than a random forest (as far as I know, boosted trees don't have a nearest-neighbour interpretation). Would this be worth investigating further?

Update:

The intuition might be that because of the scale reduction in a boosted tree ensemble, removal of some covariates tends to make the resulting expectations rather unpredictable (mostly dependant on the high-variance initial large-scale trees), whereas the redundancy of a bagged tree ensemble means that other the remaining covariates may still informatively partition the space.

If TreeSHAP on random forests does work for estimating conditional expectations, then it might be a viable option to be built into the package for non-forest underlying models. The focus would not be on estimating p(S' | S) (distribution of missing variables conditional on present variables) but instead on estimating the required conditional expectation directly - E(f(S',S)|S).

I would be happy to run some tests with random forests using the R treeshap package on the test data sets used in the paper if you could provide them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions