Skip to content

Strong dependence on using kmeans background samples for SHAP #1

Closed
@slundberg

Description

@slundberg

Hey! I finally got around to playing with the examples you have here, and I noticed that you were using shap.kmeans to get the background data. Since I typically use a random sample not kmeans (unless I am trying to really trying to play with run time optimization), I just swapped

background_distribution = shap.kmeans(xtrain,10)

for

background_distribution = shap.sample(xtrain,10)

When I did this all the adversarial results for SHAP seemed to fall apart for COMPAS...meaning 79% of the time race is still the top SHAP feature in the test dataset for the adversarial model.

This very strong dependence on using kmeans was surprising to me, since it seems to imply SHAP is much more robust to these adversarial attacks when using a typical random background sample. Have you noticed this before, or do you have any thoughts on this? I think it is worth pointing out, but I wanted to get your feedback before suggesting to users that a random sample provides better adversarial robustness.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions