Description
First of all, I'd like to thank you all for a lot of work that has gone into this package! I hope you could help me with the following problem. I'm using the R interface and after some initial problems getting it set up (the default installation has incompatible versions of python and tensorflow), I can access the AIF360 functions now. However, either the documentation is unclear as to how I should use it, or the function disparate_impact_remover is broken.
In the following code block I'm repairing some data, but the data with full repair is identical to the data without repair:
load_aif360_lib()
ad <- adult_dataset()
p <- list("race", 1)
u <- list("race", 0)
#subselect
pd_conv = ad$convert_to_dataframe()
data0 = pd_conv[[1]]
data0sub = data0[,c('race','age','sex','income-per-year')]
#turn into AIF data frame
aif_df = binary_label_dataset(
data_path = data0sub,
favor_label=1, unfavor_label=0,
unprivileged_protected_attribute=1,
privileged_protected_attribute=0,
target_column='income-per-year', protected_attribute='race')
#repair
di1 <- disparate_impact_remover(repair_level = 1.0, sensitive_attribute = "race")
rp1 <- di1$fit_transform(aif_df)
di2 <- disparate_impact_remover(repair_level = 0, sensitive_attribute = "race")
rp2 <- di2$fit_transform(aif_df)
#calc metric
bm1 = binary_label_dataset_metric(rp1, list('race', 1), list('race',0))
fl_disparate_impact1 = bm1$disparate_impact()
#calc metric
bm2 = binary_label_dataset_metric(rp2, list('race', 1), list('race',0))
fl_disparate_impact2 = bm2$disparate_impact()
> fl_disparate_impact1
[1] 0.6037688
> fl_disparate_impact2
[1] 0.6037688
Note that the subselection isn't strictly necessary, but I wanted to make sure there was no error in transforming the data sets between R data frames and the AIF360 format, as I initially noticed this problem in my own data set.
So my question is: am I doing something wrong, or are these functions broken?
Thank you in advance!