-
Notifications
You must be signed in to change notification settings - Fork 779
Description
When using score_nuisances with a discrete treatment, the function does not return the correct score.
The issue comes from the inverse_onehot function in econml/utilities.py. Currently, when it receives as input a DataFrame generated by pandas.get_dummies(), it incorrectly decodes the treatment.
For example, in case of binary treatments, labels originally coded as 0 and 1 are shifted and end up being decoded as 1 and 2, due to the following implementation:
def inverse_onehot(T):
"""
Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices.
Note that we assume that the first column has been removed from the input.
"""
assert ndim(T) == 2
# note that by default OneHotEncoder returns float64s, so need to convert to int
return (T @ np.arange(1, T.shape[1] + 1)).astype(int)
This logic introduces an off-by-one error when decoding treatments.
Expected behavior
The function should return zero-based indices, ensuring that discrete treatments (e.g. 0/1) remain consistent after decoding. A corrected implementation would have the following code:
def inverse_onehot(T):
assert econml.utilities.ndim(T) == 2
indices = (
np.arange(0, T.shape[1])
if isinstance(T, pd.DataFrame)
else np.arange(1, T.shape[1] + 1)
)
return (T @ indices).astype(int)
This change guarantees that score_nuisances computes the correct score for discrete treatments.
Contributed by @Cantal00p, @f5ilverio