Skip to content

Feature Request: Survival Analysis Support (similar to scikit-survival) #401

@HumbleHumbert

Description

@HumbleHumbert

Hi there,

First off, thank you for the incredible work on polars-ds. It has been a game-changer for integrating performant data science workflows directly into the Polars ecosystem, avoiding the costly round-trip to pandas/numpy for many standard tasks.

I often work with time-to-event data (survival analysis) and currently have to break out of the Polars lazy/expression API to use libraries like scikit-survival or lifelines. This involves materializing the data and converting it to pandas/numpy, which becomes a bottleneck with larger datasets that polars-ds would otherwise handle gracefully.

Ideally, this would mirror some of the core functionality found in scikit-survival

Specific features that would be high-value include:

  1. Kaplan-Meier Estimator: A way to compute survival probabilities over time groups.

  2. Nelson-Aalen Estimator: For cumulative hazard functions.

  3. Cox Proportional Hazards (Metrics): Even if full training is complex, having the ability to calculate the Concordance Index (C-index) or Log-Rank Test statistics directly as expressions would be incredibly useful for evaluating models or comparing cohorts.

I understand this is a significant addition, but I believe it fits well with the library's goal of extending Polars for general DS use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions