Skip to content

TreeSHAP, libxgboost, and implications for predict function #169

@bobaronoff

Description

@bobaronoff

Am looking to 'modernize' my approach and switch from partial dependence plots to Shapely plots. Shapley values are computationally demanding and would like to take advantage of the TreeSHAP algorithm that is built in to libxgboost. This feature is accessible via the predict function by using the keyword parameter 'preds_contribs' ; libxgboost predict options.

Although XGBoost.predict accepts keyword parameters, there is a limited set that is passed to libxgboost.

opts = Dict("type"=>(margin ? 1 : 0),
                "iteration_begin"=>ntree_lower_limit,
                "iteration_end"=>ntree_limit,
                "strict_shape"=>false,
                "training"=>training,
               ) |> JSON3.write

As a short term solution, I can write a personalized version to allow additional keyword parameters. I also realize that the current approach reduces risk of breaking older code.

There are three parameters (pred_contribs, pred_interactions, and pred_leaf) that could be handy to have available. Adding these parameters adds complexity related to the shape of data returned. Perhaps there is a role for a separate function i.e., 'predict_shapley' that specifically handles these additional parameters -- this would be least likely to break any pre-written code. As a new function it would be less hassle implementing 'strict_shape=true' and users can code with it in mind. Currently multi:softmax and multi:softprob add an additional dimension and need separate coding - 'strict-shape' adds a dimension called 'group' so that all objectives return the same number of dimensions. The TreeSHAP algorithms return additional dimension(s) and as we found with mult: models, those arrays are row major (C standard) where Julia is column major so it gets complicated reshaping 3(or 4) dimensional arrays.

Thank you for consideration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions