Skip to content

Commit c10e62f

Browse files
Add scikit-survival integration (#56)
1 parent 908e6d1 commit c10e62f

21 files changed

+360
-148
lines changed

.github/workflows/test.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ on:
99
jobs:
1010
test:
1111
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.10", "3.11", "3.12"]
1215

1316
steps:
1417
- name: Checkout code
@@ -17,7 +20,7 @@ jobs:
1720
- name: Set up Python
1821
uses: actions/setup-python@v4
1922
with:
20-
python-version: "3.9"
23+
python-version: ${{ matrix.python-version }}
2124

2225
- name: Install Poetry
2326
run: |

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# CHANGELOG
22

3+
## v1.0.9 (Unreleased)
4+
5+
### Features
6+
- export datasets to RDS files
7+
- test workflow runs on a Python version matrix
8+
- scikit-learn compatible data generator
9+
- compatibility helpers for lifelines and scikit-survival
10+
11+
### Documentation
12+
- updated usage examples and tutorials
13+
14+
### Misc
15+
- README quick example uses `covariate_range`
16+
17+
## v1.0.8 (2025-07-30)
18+
19+
### Documentation
20+
- ensure absolute path resolution in `conf.py`
21+
- drop unsupported theme option
22+
- define bibliography anchors and headings
23+
- fix tutorial links to non-existing docs
24+
- add additional references to the bibliography
25+
26+
### Testing
27+
- add CLI integration test
28+
- expand piecewise generator test coverage
29+
30+
### Misc
31+
- remove fix_recommendations.md
32+
33+
334

435
## v1.0.0 (2025-06-06)
536

README.md

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
- Mixture cure and piecewise exponential models
2121
- Competing risks generators (constant and Weibull hazards)
2222
- Command-line interface and export utilities
23+
- Scikit-learn compatible data generator
24+
- Conversion helper for scikit-survival and lifelines
2325

2426
## Installation
2527

@@ -40,11 +42,30 @@ poetry install
4042
## Quick Example
4143

4244
```python
43-
from gen_surv import generate
45+
from gen_surv import export_dataset, generate
4446

4547
# basic Cox proportional hazards data
46-
sim = generate(model="cphm", n=100, beta=0.5, covar=2.0,
47-
model_cens="uniform", cens_par=1.0)
48+
sim = generate(
49+
model="cphm",
50+
n=100,
51+
beta=0.5,
52+
covariate_range=2.0,
53+
model_cens="uniform",
54+
cens_par=1.0,
55+
)
56+
57+
# save to an RDS file
58+
export_dataset(sim, "survival_data.rds")
59+
```
60+
61+
You can also convert the resulting DataFrame for use with
62+
[scikit-survival](https://scikit-survival.readthedocs.io) or
63+
[lifelines](https://lifelines.readthedocs.io):
64+
65+
```python
66+
from gen_surv import to_sksurv
67+
68+
sks_dataset = to_sksurv(sim)
4869
```
4970

5071
See the [usage guide](docs/source/getting_started.md) for more examples.

TODO.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ This document outlines future enhancements, features, and ideas for improving th
99
- [] Add property-based tests using Hypothesis to cover edge cases
1010
- [] Build a CLI for generating datasets from the terminal
1111
- [ ] Expand documentation with multilingual support and more usage examples
12-
- [ ] Implement Weibull and log-logistic AFT models and add visualization utilities
12+
- [] Implement Weibull and log-logistic AFT models and add visualization utilities
1313
- [] Provide CITATION metadata for proper referencing
14-
- [ ] Ensure all functions include Google-style docstrings with inline comments
14+
- [] Ensure all functions include Google-style docstrings with inline comments
1515

1616
---
1717

@@ -37,35 +37,35 @@ This document outlines future enhancements, features, and ideas for improving th
3737

3838
- [] Add tests for each model (e.g., `test_tdcm.py`, `test_thmm.py`, `test_aft.py`)
3939
- [] Add property-based tests with `hypothesis`
40-
- [ ] Cover edge cases (e.g., invalid parameters, n=0, negative censoring)
41-
- [ ] Run tests on multiple Python versions (CI matrix)
40+
- [] Cover edge cases (e.g., invalid parameters, n=0, negative censoring)
41+
- [] Run tests on multiple Python versions (CI matrix)
4242

4343
---
4444

4545
## 🧠 4. Advanced Models
4646

47-
- [ ] Add Piecewise Exponential Model support
48-
- [ ] Add competing risks / multi-event simulation
47+
- [] Add Piecewise Exponential Model support
48+
- [] Add competing risks / multi-event simulation
4949
- [] Implement parametric AFT models (log-normal)
50-
- [ ] Implement parametric AFT models (log-logistic, weibull)
50+
- [] Implement parametric AFT models (log-logistic, weibull)
5151
- [ ] Simulate time-varying hazards
5252
- [ ] Add informative or covariate-dependent censoring
5353

5454
---
5555

5656
## 📊 5. Visualization and Analysis
5757

58-
- [ ] Create `plot_survival(df, model=...)` utilities
59-
- [ ] Create `describe_survival(df)` summary helpers
60-
- [ ] Export data to CSV / JSON / Feather
58+
- [] Create `plot_survival(df, model=...)` utilities
59+
- [] Create `describe_survival(df)` summary helpers
60+
- [] Export data to CSV / JSON / Feather
6161

6262
---
6363

6464
## 🌍 6. Ecosystem Integration
6565

66-
- [ ] Add a `GenSurvDataGenerator` compatible with `sklearn`
67-
- [ ] Enable use with `lifelines`, `scikit-survival`, `sksurv`
68-
- [ ] Export in R-compatible formats (.csv, .rds)
66+
- [] Add a `GenSurvDataGenerator` compatible with `sklearn`
67+
- [] Enable use with `lifelines`, `scikit-survival`, `sksurv`
68+
- [] Export in R-compatible formats (.csv, .rds)
6969

7070
---
7171

@@ -80,12 +80,12 @@ This document outlines future enhancements, features, and ideas for improving th
8080
## 🧠 8. New Survival Models to Implement
8181

8282
- [] Log-Normal AFT
83-
- [ ] Log-Logistic AFT
84-
- [ ] Weibull AFT
85-
- [ ] Piecewise Exponential
86-
- [ ] Competing Risks
83+
- [] Log-Logistic AFT
84+
- [] Weibull AFT
85+
- [] Piecewise Exponential
86+
- [] Competing Risks
8787
- [ ] Recurrent Events
88-
- [ ] Mixture Cure Model
88+
- [] Mixture Cure Model
8989

9090
---
9191

docs/source/bibliography.md

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,54 @@ orphan: true
66

77
Below is a selection of references covering the statistical models implemented in **gen_surv**.
88

9-
.. _Cox1972:
9+
(Cox1972)=
10+
## Cox (1972)
1011
Cox, D. R. (1972). Regression Models and Life-Tables. *Journal of the Royal Statistical Society: Series B*, 34(2), 187-220.
1112

12-
.. _Farewell1982:
13+
(Farewell1982)=
14+
## Farewell (1982)
1315
Farewell, V.T. (1982). The Use of Mixture Models for the Analysis of Survival Data with Long-Term Survivors. *Biometrics*, 38(4), 1041-1046.
1416

15-
.. _FineGray1999:
17+
(FineGray1999)=
18+
## Fine and Gray (1999)
1619
Fine, J.P., & Gray, R.J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk. *Journal of the American Statistical Association*, 94(446), 496-509.
1720

18-
.. _Andersen1993:
21+
(Andersen1993)=
22+
## Andersen et al. (1993)
1923
Andersen, P.K., Borgan, Ø., Gill, R.D., & Keiding, N. (1993). *Statistical Models Based on Counting Processes*. Springer.
2024

21-
.. _Zucchini2017:
25+
(Zucchini2017)=
26+
## Zucchini et al. (2017)
2227
Zucchini, W., MacDonald, I.L., & Langrock, R. (2017). *Hidden Markov Models for Time Series*. Chapman and Hall/CRC.
2328

24-
- Klein, J.P., & Moeschberger, M.L. (2003). *Survival Analysis: Techniques for Censored and Truncated Data*. Springer.
25-
- Kalbfleisch, J.D., & Prentice, R.L. (2002). *The Statistical Analysis of Failure Time Data*. Wiley.
26-
- Cook, R.J., & Lawless, J.F. (2007). *The Statistical Analysis of Recurrent Events*. Springer.
29+
(KleinMoeschberger2003)=
30+
## Klein and Moeschberger (2003)
31+
Klein, J.P., & Moeschberger, M.L. (2003). *Survival Analysis: Techniques for Censored and Truncated Data*. Springer.
32+
33+
(KalbfleischPrentice2002)=
34+
## Kalbfleisch and Prentice (2002)
35+
Kalbfleisch, J.D., & Prentice, R.L. (2002). *The Statistical Analysis of Failure Time Data*. Wiley.
36+
37+
(CookLawless2007)=
38+
## Cook and Lawless (2007)
39+
Cook, R.J., & Lawless, J.F. (2007). *The Statistical Analysis of Recurrent Events*. Springer.
40+
41+
(KaplanMeier1958)=
42+
## Kaplan and Meier (1958)
43+
Kaplan, E.L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. *Journal of the American Statistical Association*, 53(282), 457-481.
44+
(TherneauGrambsch2000)=
45+
## Therneau and Grambsch (2000)
46+
Therneau, T.M., & Grambsch, P.M. (2000). *Modeling Survival Data: Extending the Cox Model*. Springer.
47+
48+
(FlemingHarrington1991)=
49+
## Fleming and Harrington (1991)
50+
Fleming, T.R., & Harrington, D.P. (1991). *Counting Processes and Survival Analysis*. Wiley.
51+
52+
(Collett2015)=
53+
## Collett (2015)
54+
Collett, D. (2015). *Modelling Survival Data in Medical Research*. CRC Press.
55+
56+
(KleinbaumKlein2012)=
57+
## Kleinbaum and Klein (2012)
58+
Kleinbaum, D.G., & Klein, M. (2012). *Survival Analysis: A Self-Learning Text*. Springer.
59+

docs/source/conf.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
import sys
33
from pathlib import Path
44

5-
# Add the package to the Python path
6-
project_root = Path(__file__).parent.parent.parent
5+
# Add the package to the Python path using an absolute path
6+
project_root = Path(__file__).resolve().parent.parent.parent
77
sys.path.insert(0, str(project_root / "gen_surv"))
88

99
# Project information
@@ -74,7 +74,6 @@
7474
'canonical_url': 'https://gensurvpy.readthedocs.io/',
7575
'analytics_id': '',
7676
'logo_only': False,
77-
'display_version': True,
7877
'prev_next_buttons_location': 'bottom',
7978
'style_external_links': False,
8079
'style_nav_header_background': '#2980B9',

docs/source/getting_started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ from gen_surv import generate
3333
df = generate(
3434
model="cphm", # Model type
3535
n=100, # Sample size
36-
beta=0.5, # Covariate effect
37-
covar=2.0, # Covariate range
36+
beta=0.5, # Covariate effect
37+
covariate_range=2.0, # Covariate range
3838
model_cens="uniform", # Censoring type
3939
cens_par=3.0 # Censoring parameter
4040
)

docs/source/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pip install gen-surv
2222
Generate your first dataset:
2323
```python
2424
from gen_surv import generate
25-
df = generate(model="cphm", n=100, beta=0.5, covar=2.0)
25+
df = generate(model="cphm", n=100, beta=0.5, covariate_range=2.0)
2626
```
2727
```
2828
@@ -72,7 +72,7 @@ df = gs.generate(
7272
model="cphm",
7373
n=500,
7474
beta=0.5,
75-
covar=2.0,
75+
covariate_range=2.0,
7676
model_cens="uniform",
7777
cens_par=3.0
7878
)

docs/source/tutorials/basic_usage.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ import pandas as pd
1515
model="cphm",
1616
n=200,
1717
beta=0.7,
18-
covar=1.5,
18+
covariate_range=1.5,
1919
model_cens="exponential",
2020
cens_par=2.0,
2121
seed=42 # For reproducibility
@@ -43,7 +43,7 @@ All models share these parameters:
4343
Each model has unique parameters. For CPHM:
4444

4545
- `beta`: Covariate effect (hazard ratio = exp(beta))
46-
- `covar`: Range for uniform covariate generation [0, covar]
46+
- `covariate_range`: Range for uniform covariate generation [0, covariate_range]
4747

4848
## Censoring Mechanisms
4949

@@ -56,7 +56,7 @@ df_uniform = generate(
5656
model="cphm",
5757
n=100,
5858
beta=0.5,
59-
covar=2.0,
59+
covariate_range=2.0,
6060
model_cens="uniform",
6161
cens_par=3.0
6262
)
@@ -69,7 +69,7 @@ df_exponential = generate(
6969
model="cphm",
7070
n=100,
7171
beta=0.5,
72-
covar=2.0,
72+
covariate_range=2.0,
7373
model_cens="exponential",
7474
cens_par=2.0
7575
)
@@ -93,8 +93,8 @@ ax1.set_ylabel('Frequency')
9393
ax1.set_title('Distribution of Observed Times')
9494

9595
# Event rate vs covariate
96-
df['covar_bin'] = pd.cut(df['covariate'], bins=5)
97-
event_rate = df.groupby('covar_bin')['status'].mean()
96+
df['covariate_bin'] = pd.cut(df['covariate'], bins=5)
97+
event_rate = df.groupby('covariate_bin')['status'].mean()
9898
event_rate.plot(kind='bar', ax=ax2, rot=45)
9999
ax2.set_ylabel('Event Rate')
100100
ax2.set_title('Event Rate by Covariate Level')
@@ -105,6 +105,6 @@ plt.show()
105105

106106
## Next Steps
107107

108-
- Try different models: {doc}`model_comparison`
109-
- Learn advanced features: {doc}`advanced_features`
110-
- See integration examples: {doc}`integration_examples`
108+
- Try different models (model_comparison)
109+
- Learn advanced features (advanced_features)
110+
- See integration examples (integration_examples)

docs/source/usage.md

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,20 @@ This will create a virtual environment and install all required packages.
2121
Generate datasets directly in Python:
2222

2323
```python
24-
from gen_surv import generate
24+
from gen_surv import export_dataset, generate
2525

2626
# Cox Proportional Hazards example
27-
generate(model="cphm", n=100, model_cens="uniform", cens_par=1.0, beta=0.5, covariate_range=2.0)
27+
df = generate(
28+
model="cphm",
29+
n=100,
30+
model_cens="uniform",
31+
cens_par=1.0,
32+
beta=0.5,
33+
covariate_range=2.0,
34+
)
35+
36+
# Save to RDS for use in R
37+
export_dataset(df, "simulated_data.rds")
2838
```
2939

3040
You can also generate data from the command line:
@@ -47,3 +57,27 @@ make html
4757

4858
The generated files will be available under `docs/build/html`.
4959

60+
## Scikit-learn Integration
61+
62+
You can wrap the generator in a transformer compatible with scikit-learn:
63+
64+
```python
65+
from gen_surv import GenSurvDataGenerator
66+
67+
est = GenSurvDataGenerator("cphm", n=10, beta=0.5, covariate_range=1.0)
68+
df = est.fit_transform()
69+
```
70+
71+
## Lifelines and scikit-survival
72+
73+
Datasets generated with **gen_surv** can be directly used with
74+
[lifelines](https://lifelines.readthedocs.io). For
75+
[scikit-survival](https://scikit-survival.readthedocs.io) you can convert the
76+
DataFrame using ``to_sksurv``:
77+
78+
```python
79+
from gen_surv import to_sksurv
80+
81+
struct = to_sksurv(df)
82+
```
83+

0 commit comments

Comments
 (0)