Strong dependence on using kmeans background samples for SHAP

Hey! I finally got around to playing with the examples you have here, and I noticed that you were using `shap.kmeans` to get the background data. Since I typically use a random sample not kmeans (unless I am trying to really trying to play with run time optimization), I just swapped
```
background_distribution = shap.kmeans(xtrain,10)
```
for
```
background_distribution = shap.sample(xtrain,10)
```

When I did this all the adversarial results for SHAP seemed to fall apart for COMPAS...meaning 79% of the time race is still the top SHAP feature in the test dataset for the adversarial model.

This very strong dependence on using kmeans was surprising to me, since it seems to imply SHAP is much more robust to these adversarial attacks when using a typical random background sample. Have you noticed this before, or do you have any thoughts on this? I think it is worth pointing out, but I wanted to get your feedback before suggesting to users that a random sample provides better adversarial robustness.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strong dependence on using kmeans background samples for SHAP #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Strong dependence on using kmeans background samples for SHAP #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions