Skip to content

Why I got different lift result when using get_cumlift() and calculating line by line?  #706

@AmyLin0515

Description

@AmyLin0515

Describe the bug
Hi Team!
I used get_cumlift(), and got the lift for S-Learner like this:
image

When I tried to duplicate the result, calculating it manually, the result is different from what I had using get_cumlift().

sorted_df = df_try.sort_values(col, ascending=False).reset_index(drop=True)
sorted_df.index = sorted_df.index + 1
sorted_df["cumsum_tr"] = sorted_df['w'].cumsum()
sorted_df["cumsum_ct"] = sorted_df.index.values - sorted_df["cumsum_tr"]
sorted_df["cumsum_y_tr"] = (sorted_df['y'] * sorted_df['w']).cumsum()
sorted_df["cumsum_y_ct"] = (sorted_df['y'] * (1 - sorted_df['w'])).cumsum()

This is how table looks like:
image

And then I calculate the lift:

lift=[]
lift.append(sorted_df["cumsum_y_tr"] / sorted_df["cumsum_tr"] - sorted_df["cumsum_y_ct"] / sorted_df["cumsum_ct"])
lift = pd.concat(lift, join="inner", axis=1)
lift.loc[0] = np.zeros((lift.shape[1],))
lift = lift.sort_index().interpolate()

This is how the final result looks like:
image

I plot the difference between the result from get_cumlif() and manual calculation.
image

Does anyone know why they are different?

Environment (please complete the following information):

  • OS: Windows
  • Python Version: 3.8
  • Versions of Major Dependencies (pandas, scikit-learn, cython):pandas==1.3.5, scikit-learn==1.0.2, cython==0.29.34]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions