Skip to content

FACTS method bug: code in valid_ifthens function has hard-coded feature names #531

Open
@phantom-duck

Description

@phantom-duck

In the internal valid_ifthens function, there are 2 points where there exist hard-coded feature names in the code, and these parts of the code fail without them. Specifically:

  1. here, there is a reference to an age column of X, which is also assumed to be of pd.Interval dtype. As such, if this column does not exist or is not of Interval dtype, this part of the code throws an error.

Example to reproduce:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.datasets.openml_datasets import fetch_german
from aif360.sklearn.detectors.facts import FACTS

X, y = fetch_german()
assert (X.index == y.index).all()
X.reset_index(drop=True, inplace=True)
y = y.reset_index(drop=True).map({"bad": 0, "good": 1})

# split into train-test data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, stratify=y)

categorical_features = X.select_dtypes(include=["object", "category"]).columns.to_list()
categorical_features_onehot_transformer = ColumnTransformer(
    transformers=[
        ("one-hot-encoder", OneHotEncoder(), categorical_features)
    ],
    remainder="passthrough"
)
model = Pipeline([
    ("one-hot-encoder", categorical_features_onehot_transformer),
    ("clf", LogisticRegression(max_iter=1500))
])

#### train the model
model = model.fit(X_train, y_train)

detector = FACTS(
    clf=model,
    prot_attr="sex",
    feature_weights={f: 1 for f in X.columns},
    feats_not_allowed_to_change=[]
)

detector = detector.fit(X_test)

The last command fails with AttributeError: 'numpy.float64' object has no attribute 'left'

  1. At this point, the recIsValid function is used. This, in turn, here also references hard-coded feature names. Here there exist checks of whether they exist or not, so the code does not fail if they do not exist. But there are cases where if a feature exists, it is assumed either to be of a certain type or to possess certain semantics.

I do not currently have a reproducible example for this one, because whether it will appear or not depends on the exact test data. I believe, however, that it is clear this is also a bug, and if we want to enforce some constraints, such as this part of the code is trying to do, it should be done in some other, more robust way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions