Open
Description
In the internal valid_ifthens
function, there are 2 points where there exist hard-coded feature names in the code, and these parts of the code fail without them. Specifically:
- here, there is a reference to an
age
column ofX
, which is also assumed to be ofpd.Interval
dtype. As such, if this column does not exist or is not of Interval dtype, this part of the code throws an error.
Example to reproduce:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from aif360.sklearn.datasets.openml_datasets import fetch_german
from aif360.sklearn.detectors.facts import FACTS
X, y = fetch_german()
assert (X.index == y.index).all()
X.reset_index(drop=True, inplace=True)
y = y.reset_index(drop=True).map({"bad": 0, "good": 1})
# split into train-test data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, stratify=y)
categorical_features = X.select_dtypes(include=["object", "category"]).columns.to_list()
categorical_features_onehot_transformer = ColumnTransformer(
transformers=[
("one-hot-encoder", OneHotEncoder(), categorical_features)
],
remainder="passthrough"
)
model = Pipeline([
("one-hot-encoder", categorical_features_onehot_transformer),
("clf", LogisticRegression(max_iter=1500))
])
#### train the model
model = model.fit(X_train, y_train)
detector = FACTS(
clf=model,
prot_attr="sex",
feature_weights={f: 1 for f in X.columns},
feats_not_allowed_to_change=[]
)
detector = detector.fit(X_test)
The last command fails with AttributeError: 'numpy.float64' object has no attribute 'left'
- At this point, the
recIsValid
function is used. This, in turn, here also references hard-coded feature names. Here there exist checks of whether they exist or not, so the code does not fail if they do not exist. But there are cases where if a feature exists, it is assumed either to be of a certain type or to possess certain semantics.
I do not currently have a reproducible example for this one, because whether it will appear or not depends on the exact test data. I believe, however, that it is clear this is also a bug, and if we want to enforce some constraints, such as this part of the code is trying to do, it should be done in some other, more robust way.
Metadata
Metadata
Assignees
Labels
No labels