Description
sklearn.preprocessing.OneHotEncoder exposes multiple parameters such as: drop
, handle_unknown
, etc. which are useful to avoid overparametrisation (e.g. overparametrisation of binary variables, dummy trap for linear regression etc.) or handle unknown values.
Currently there exist some explainers that know how to deal with one-hot encoded features such as: AnchorTabular, Counterfactuals. Some utility function used by those are available here. The limitation of using the ohe
flag (i.e. passing ohe dataset) is that the explainers don't know how to deal with the cases mentioned above/
It would be good to see if we can extend the capabilities of our explainers and our utility functions to deal with all the arguments available in sklearn.preprocessing.OneHotEncoder. Otherwise, it would be preferable to mention explicitly in our documentation what "format" the ohe is expected to be in (i.e. full representation only).