Skip to content

gen_surv is a Python package for simulating survival data under a variety of models, inspired by the R package genSurv

License

Notifications You must be signed in to change notification settings

DiogoRibeiro7/genSurvPy

gen_surv

Coverage Docs PyPI Tests Python

GitHub stars GitHub forks

gen_surv is a Python library for simulating survival data and producing visualizations under a wide range of statistical models. Inspired by the R package genSurv, it offers a unified interface for generating realistic datasets for research, teaching and benchmarking.


Features

  • Cox proportional hazards model (CPHM)
  • Accelerated failure time models (log-normal, log-logistic, Weibull)
  • Continuous-time multi-state Markov model (CMM)
  • Time-dependent covariate model (TDCM)
  • Time-homogeneous hidden Markov model (THMM)
  • Mixture cure and piecewise exponential models
  • Competing risks generators (constant and Weibull hazards)
  • Visualization helpers built on matplotlib and lifelines
  • Scikit-learn compatible data generator
  • Conversion utilities for scikit-survival
  • Command-line interface for dataset creation and visualization

Installation

Requires Python 3.10 or later.

Install the latest release from PyPI:

pip install gen-surv

gen_surv installs matplotlib and lifelines for visualization. Support for scikit-survival is optional; install it to enable integration with the scikit-survival ecosystem or to run the full test suite:

pip install gen-surv[dev]

To develop locally with all extras:

git clone https://github.com/DiogoRibeiro7/genSurvPy.git
cd genSurvPy
poetry install --with dev

On Debian/Ubuntu you may need build-essential gfortran libopenblas-dev to build scikit-survival.

Development

Before committing changes, install the pre-commit hooks and run the tests:

pre-commit install
pre-commit run --all-files
pytest

Tests that depend on optional packages such as scikit-survival are skipped automatically when those packages are missing.

Usage

Python API

from gen_surv import generate, export_dataset, to_sksurv
from gen_surv.visualization import plot_survival_curve

sim = generate(
    model="cphm",
    n=100,
    beta=0.5,
    covariate_range=2.0,
    model_cens="uniform",
    cens_par=1.0,
)

plot_survival_curve(sim)
export_dataset(sim, "survival_data.rds")

# convert for scikit-survival
sks_dataset = to_sksurv(sim)

See the usage guide for more examples.

Command Line

Generate datasets and plots without writing Python code:

python -m gen_surv dataset cphm --n 1000 -o survival.csv

python -m gen_surv visualize survival.csv --output survival_plot.png

visualize accepts custom column names via --time-col and --status-col and can stratify by group with --group-col.

Supported Models

Model Description
CPHM Cox proportional hazards
AFT Accelerated failure time (log-normal, log-logistic, Weibull)
CMM Continuous-time multi-state Markov
TDCM Time-dependent covariates
THMM Time-homogeneous hidden Markov
Competing Risks Multiple event types with cause-specific hazards
Mixture Cure Models long-term survivors
Piecewise Exponential Flexible baseline hazard via intervals

More details on each algorithm are available in the Algorithms page. For additional background, see the theory guide.

Documentation

Full documentation is hosted on Read the Docs. It includes installation instructions, tutorials, API references and a bibliography.

To build the docs locally:

cd docs
make html

Open build/html/index.html in your browser to view the result.

License

This project is licensed under the MIT License. See LICENSE for details.

Citation

If you use gen_surv in your research, please cite the project using the metadata in CITATION.cff.

Author

Diogo RibeiroESMAD - Instituto Politécnico do Porto

About

gen_surv is a Python package for simulating survival data under a variety of models, inspired by the R package genSurv

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages