Skip to content

Fix data leakage in _LifelinesAdapter when y has multiple columns#971

Open
aditi-malviya666 wants to merge 1 commit intosktime:mainfrom
aditi-malviya666:fix-lifelines-leakage
Open

Fix data leakage in _LifelinesAdapter when y has multiple columns#971
aditi-malviya666 wants to merge 1 commit intosktime:mainfrom
aditi-malviya666:fix-lifelines-leakage

Conversation

@aditi-malviya666
Copy link

Reference Issues/PRs

Fixes #947

What does this implement/fix? Explain your changes.

This PR fixes a data leakage issue in _LifelinesAdapter.

Previously, when y contained multiple columns, all columns were concatenated into the training DataFrame. Since lifelines treats any column not specified as duration_col or event_col as covariates, this caused unintended leakage of target information into the feature set.

The fix ensures that only:

  • feature columns from X
  • the first column of y (used as duration_col)
  • the event column derived from C (if provided)

are included in the DataFrame passed to the lifelines estimator.

Additionally, a warning is raised when y contains multiple columns to inform the user that only the first column is used.

Does your contribution introduce a new dependency? If yes, which one?

No new dependencies are introduced.

What should a reviewer concentrate their feedback on?

  • Correctness of DataFrame construction in _fit
  • Whether excluding extra columns from y is consistent with expected API behavior
  • Handling of censoring column (C) and compatibility with lifelines

Did you add any tests for the change?

No new tests have been added yet, but the fix has been manually validated using a reproducible example that previously demonstrated leakage.

Any other comments?

This fix prevents unintended target leakage and aligns the adapter behavior with lifelines expectations.

PR checklist

For all contributions
  • I've added myself to the list of contributors
  • The PR title starts with [BUG]
For new estimators
  • Not applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Data leakage in _LifelinesAdapter when y has multiple columns

1 participant