Skip to content

score_nuisances with discrete treatment returns incorrect score #1006

@Cantal00p

Description

@Cantal00p

When using score_nuisances with a discrete treatment, the function does not return the correct score.

The issue comes from the inverse_onehot function in econml/utilities.py. Currently, when it receives as input a DataFrame generated by pandas.get_dummies(), it incorrectly decodes the treatment.

For example, in case of binary treatments, labels originally coded as 0 and 1 are shifted and end up being decoded as 1 and 2, due to the following implementation:

def inverse_onehot(T):
    """
    Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices.

    Note that we assume that the first column has been removed from the input.
    """
    assert ndim(T) == 2
    # note that by default OneHotEncoder returns float64s, so need to convert to int
    return (T @ np.arange(1, T.shape[1] + 1)).astype(int)

This logic introduces an off-by-one error when decoding treatments.

Expected behavior

The function should return zero-based indices, ensuring that discrete treatments (e.g. 0/1) remain consistent after decoding. A corrected implementation would have the following code:

def inverse_onehot(T):
    assert econml.utilities.ndim(T) == 2

    indices = (
        np.arange(0, T.shape[1])
        if isinstance(T, pd.DataFrame)
        else np.arange(1, T.shape[1] + 1)
    )

    return (T @ indices).astype(int)

This change guarantees that score_nuisances computes the correct score for discrete treatments.

Contributed by @Cantal00p, @f5ilverio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions