-
-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Hi there,
First off, thank you for the incredible work on polars-ds. It has been a game-changer for integrating performant data science workflows directly into the Polars ecosystem, avoiding the costly round-trip to pandas/numpy for many standard tasks.
I often work with time-to-event data (survival analysis) and currently have to break out of the Polars lazy/expression API to use libraries like scikit-survival or lifelines. This involves materializing the data and converting it to pandas/numpy, which becomes a bottleneck with larger datasets that polars-ds would otherwise handle gracefully.
Ideally, this would mirror some of the core functionality found in scikit-survival
Specific features that would be high-value include:
-
Kaplan-Meier Estimator: A way to compute survival probabilities over time groups.
-
Nelson-Aalen Estimator: For cumulative hazard functions.
-
Cox Proportional Hazards (Metrics): Even if full training is complex, having the ability to calculate the Concordance Index (C-index) or Log-Rank Test statistics directly as expressions would be incredibly useful for evaluating models or comparing cohorts.
I understand this is a significant addition, but I believe it fits well with the library's goal of extending Polars for general DS use cases.