-
Notifications
You must be signed in to change notification settings - Fork 171
Add an option to order by ascending/descending prediction in cumulative effect curves #204
Description
Describe the feature and the current state.
In the causal validation module and the curves file, it would be useful to add an ascending parameter for the cumulative effect and cumulative gain curves.
The current state is to order predictions descending:
ordered_df = df.sort_values(prediction, ascending=False).reset_index(drop=True)If we add an ascending: bool = False argument to the cumulative_effect_curve, cumulative_gain_curve, relative_cumulative_gain_curve, and effect_curves, a user could modify how these effects are computed, whether to do them ascending or descending by the prediction column.
Will this change a current behavior? How?
Not if the user does not explicitly change the argument to ascending=True. If they do, the cumulative effect or cumulative gain curves will be computed using an ascending ordering in the prediction column.
A model could output a prediction that is not necessarily positively related to the effect to be computed, so adding an option to order this relationship differently will allow for effects and gains with negatively related predictions and outcomes to be computed adequately.
One current workaround is to do this:
df["prediction"] = -df["prediction"]and then the computation will be made adequately. But this seems like a hack and maybe something we want to solve more cleanly.
Additional Information
The new definition of cumulative_effect_curve would look like this:
@curry
def cumulative_effect_curve(df: pd.DataFrame,
treatment: str,
outcome: str,
prediction: str,
min_rows: int = 30,
steps: int = 100,
effect_fn: EffectFnType = linear_effect,
ascending: bool = False) -> np.ndarray:
"""
Orders the dataset by prediction and computes the cumulative effect curve according to that ordering
Parameters
----------
df : Pandas' DataFrame
A Pandas' DataFrame with target and prediction scores.
treatment : Strings
The name of the treatment column in `df`.
outcome : Strings
The name of the outcome column in `df`.
prediction : Strings
The name of the prediction column in `df`.
min_rows : Integer
Minimum number of observations needed to have a valid result.
steps : Integer
The number of cumulative steps to iterate when accumulating the effect
effect_fn : function (df: pandas.DataFrame, treatment: str, outcome: str) -> int or Array of int
A function that computes the treatment effect given a dataframe, the name of the treatment column and the name
of the outcome column.
ascending : bool
Whether the prediction column should be ordered ascending or not. Default is False.
Returns
----------
cumulative effect curve: Numpy's Array
The cumulative treatment effect according to the predictions ordering.
"""
size = df.shape[0]
ordered_df = df.sort_values(prediction, ascending=ascending).reset_index(drop=True)
n_rows = list(range(min_rows, size, size // steps)) + [size]
return np.array([effect_fn(ordered_df.head(rows), treatment, outcome) for rows in n_rows])