gen_surv is a Python library for simulating survival data and producing visualizations under a wide range of statistical models. Inspired by the R package genSurv, it offers a unified interface for generating realistic datasets for research, teaching and benchmarking.
- Cox proportional hazards model (CPHM)
- Accelerated failure time models (log-normal, log-logistic, Weibull)
- Continuous-time multi-state Markov model (CMM)
- Time-dependent covariate model (TDCM)
- Time-homogeneous hidden Markov model (THMM)
- Mixture cure and piecewise exponential models
- Competing risks generators (constant and Weibull hazards)
- Visualization helpers built on matplotlib and lifelines
- Scikit-learn compatible data generator
- Conversion utilities for scikit-survival
- Command-line interface for dataset creation and visualization
Requires Python 3.10 or later.
Install the latest release from PyPI:
pip install gen-survgen_surv installs matplotlib and lifelines for visualization. Support for scikit-survival is optional; install it to enable integration with the scikit-survival ecosystem or to run the full test suite:
pip install gen-surv[dev]To develop locally with all extras:
git clone https://github.com/DiogoRibeiro7/genSurvPy.git
cd genSurvPy
poetry install --with devOn Debian/Ubuntu you may need build-essential gfortran libopenblas-dev to build scikit-survival.
Before committing changes, install the pre-commit hooks and run the tests:
pre-commit install
pre-commit run --all-files
pytestTests that depend on optional packages such as scikit-survival are skipped automatically when those packages are missing.
from gen_surv import generate, export_dataset, to_sksurv
from gen_surv.visualization import plot_survival_curve
sim = generate(
model="cphm",
n=100,
beta=0.5,
covariate_range=2.0,
model_cens="uniform",
cens_par=1.0,
)
plot_survival_curve(sim)
export_dataset(sim, "survival_data.rds")
# convert for scikit-survival
sks_dataset = to_sksurv(sim)See the usage guide for more examples.
Generate datasets and plots without writing Python code:
python -m gen_surv dataset cphm --n 1000 -o survival.csv
python -m gen_surv visualize survival.csv --output survival_plot.pngvisualize accepts custom column names via --time-col and --status-col and can stratify by group with --group-col.
| Model | Description |
|---|---|
| CPHM | Cox proportional hazards |
| AFT | Accelerated failure time (log-normal, log-logistic, Weibull) |
| CMM | Continuous-time multi-state Markov |
| TDCM | Time-dependent covariates |
| THMM | Time-homogeneous hidden Markov |
| Competing Risks | Multiple event types with cause-specific hazards |
| Mixture Cure | Models long-term survivors |
| Piecewise Exponential | Flexible baseline hazard via intervals |
More details on each algorithm are available in the Algorithms page. For additional background, see the theory guide.
Full documentation is hosted on Read the Docs. It includes installation instructions, tutorials, API references and a bibliography.
To build the docs locally:
cd docs
make htmlOpen build/html/index.html in your browser to view the result.
This project is licensed under the MIT License. See LICENSE for details.
If you use gen_surv in your research, please cite the project using the metadata in CITATION.cff.
Diogo Ribeiro — ESMAD - Instituto Politécnico do Porto
- ORCID: https://orcid.org/0009-0001-2022-7072
- Professional email: [email protected]
- Personal email: [email protected]
- GitHub: @DiogoRibeiro7