|
1 | | -# TODO – Roadmap for gen_surv |
| 1 | +# gen_surv Roadmap |
2 | 2 |
|
3 | | -This document outlines future enhancements, features, and ideas for improving the gen_surv package. |
| 3 | +This document outlines the planned development priorities for future versions of gen_surv. This roadmap will be periodically updated based on user feedback, research needs, and community contributions. |
4 | 4 |
|
5 | | ---- |
| 5 | +## Short-term Goals (v1.1.x) |
6 | 6 |
|
7 | | -## ✨ Priority Items |
| 7 | +### Additional Statistical Models |
| 8 | +- [ ] **Recurrent Events Model**: Generate data with multiple events per subject |
| 9 | +- [ ] **Time-Varying Effects**: Support for non-proportional hazards with coefficients that change over time |
| 10 | +- [ ] **Extended Competing Risks**: Allow for correlation between competing risks |
8 | 11 |
|
9 | | -- [✅] Add property-based tests using Hypothesis to cover edge cases |
10 | | -- [✅] Build a CLI for generating datasets from the terminal |
11 | | -- [ ] Expand documentation with multilingual support and more usage examples |
12 | | -- [✅] Implement Weibull and log-logistic AFT models and add visualization utilities |
13 | | -- [✅] Provide CITATION metadata for proper referencing |
14 | | -- [✅] Ensure all functions include Google-style docstrings with inline comments |
| 12 | +### Visualization and Analysis |
| 13 | +- [ ] **Enhanced Visualization Toolkit**: Add more plot types and customization options |
| 14 | +- [ ] **Interactive Visualizations**: Add options using Plotly for interactive exploration |
| 15 | +- [ ] **Data Quality Reports**: Generate reports on statistical properties of generated datasets |
15 | 16 |
|
16 | | ---- |
| 17 | +### Usability Improvements |
| 18 | +- [ ] **Dataset Catalog**: Pre-configured parameters to mimic classic survival datasets |
| 19 | +- [ ] **Parameter Estimation**: Tools to estimate generation parameters from existing datasets |
| 20 | +- [ ] **Extended CLI**: Add more command-line options for all models |
17 | 21 |
|
18 | | -## 📦 1. Interface and UX |
| 22 | +## Medium-term Goals (v1.2.x) |
19 | 23 |
|
20 | | -- [✅] Create a `generate(..., return_type="df" | "dict")` interface |
21 | | -- [✅] Add `__version__` using `importlib.metadata` or `poetry-dynamic-versioning` |
22 | | -- [✅] Build a CLI with `typer` or `click` |
23 | | -- [✅] Add example notebooks or scripts for each model (`examples/` folder) |
| 24 | +### Advanced Statistical Models |
| 25 | +- [ ] **Joint Longitudinal-Survival Models**: Generators for models that simultaneously handle longitudinal outcomes and time-to-event data |
| 26 | +- [ ] **Frailty Models**: Support for shared and nested frailty models |
| 27 | +- [ ] **Interval Censoring**: Support for interval-censored data generation |
24 | 28 |
|
25 | | ---- |
| 29 | +### Technical Enhancements |
| 30 | +- [ ] **Parallel Processing**: Multi-core support for faster generation of large datasets |
| 31 | +- [ ] **Memory Optimization**: Streaming data generation for very large datasets |
| 32 | +- [ ] **Performance Benchmarks**: Systematic benchmarking of data generation speed |
26 | 33 |
|
27 | | -## 📚 2. Documentation |
| 34 | +### Integration and Ecosystem |
| 35 | +- [ ] **scikit-learn Extensions**: More scikit-learn compatible estimators and transformers |
| 36 | +- [ ] **Stan/PyMC Integration**: Export data in formats suitable for Bayesian modeling |
| 37 | +- [ ] **Dashboard**: Simple Streamlit app for data exploration and generation |
28 | 38 |
|
29 | | -- [✅] Add a "Model Comparison Guide" section (`index.md` + `theory.md`) |
30 | | -- [✅] Add "How It Works" sections for each model (`theory.md`) |
31 | | -- [✅] Include usage examples in index with real calls |
32 | | -- [ ] Optional: add multilingual docs using `sphinx-intl` |
| 39 | +## Long-term Goals (v2.x) |
33 | 40 |
|
34 | | ---- |
| 41 | +### Advanced Features |
| 42 | +- [ ] **Bayesian Survival Models**: Generators for Bayesian survival analysis with various priors |
| 43 | +- [ ] **Spatial Survival Models**: Generate survival data with spatial correlation |
| 44 | +- [ ] **Survival Neural Networks**: Integration with deep learning approaches to survival analysis |
35 | 45 |
|
36 | | -## 🧪 3. Testing and Quality |
| 46 | +### Infrastructure and Performance |
| 47 | +- [ ] **GPU Acceleration**: Optional GPU support for large dataset generation |
| 48 | +- [ ] **JAX/Numba Implementation**: High-performance implementations of key algorithms |
| 49 | +- [ ] **R Interface**: Create an R package that interfaces with gen_surv |
37 | 50 |
|
38 | | -- [✅] Add tests for each model (e.g., `test_tdcm.py`, `test_thmm.py`, `test_aft.py`) |
39 | | -- [✅] Add property-based tests with `hypothesis` |
40 | | -- [✅] Cover edge cases (e.g., invalid parameters, n=0, negative censoring) |
41 | | -- [✅] Run tests on multiple Python versions (CI matrix) |
| 51 | +### Community and Documentation |
| 52 | +- [ ] **Interactive Tutorials**: Using Jupyter Book or similar tools |
| 53 | +- [ ] **Video Tutorials**: Short video demonstrations of key features |
| 54 | +- [ ] **Case Studies**: Real-world examples showing how gen_surv can be used for teaching or research |
| 55 | +- [ ] **User Showcase**: Gallery of research or teaching that uses gen_surv |
42 | 56 |
|
43 | | ---- |
| 57 | +## How to Contribute |
44 | 58 |
|
45 | | -## 🧠 4. Advanced Models |
| 59 | +We welcome contributions that help us achieve these roadmap goals! If you're interested in working on any of these features, please check the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines and open an issue to discuss your approach before submitting a pull request. |
46 | 60 |
|
47 | | -- [✅] Add Piecewise Exponential Model support |
48 | | -- [✅] Add competing risks / multi-event simulation |
49 | | -- [✅] Implement parametric AFT models (log-normal) |
50 | | -- [✅] Implement parametric AFT models (log-logistic, weibull) |
51 | | -- [ ] Simulate time-varying hazards |
52 | | -- [ ] Add informative or covariate-dependent censoring |
| 61 | +For suggesting new features or modifications to this roadmap, please open an issue with the "enhancement" tag. |
53 | 62 |
|
54 | | ---- |
| 63 | +## Version History |
55 | 64 |
|
56 | | -## 📊 5. Visualization and Analysis |
57 | | - |
58 | | -- [✅] Create `plot_survival(df, model=...)` utilities |
59 | | -- [✅] Create `describe_survival(df)` summary helpers |
60 | | -- [✅] Export data to CSV / JSON / Feather |
61 | | - |
62 | | ---- |
63 | | - |
64 | | -## 🌍 6. Ecosystem Integration |
65 | | - |
66 | | -- [✅] Add a `GenSurvDataGenerator` compatible with `sklearn` |
67 | | -- [✅] Enable use with `lifelines`, `scikit-survival`, `sksurv` |
68 | | -- [✅] Export in R-compatible formats (.csv, .rds) |
69 | | - |
70 | | ---- |
71 | | - |
72 | | -## 🔁 7. Other Ideas |
73 | | - |
74 | | -- [ ] Add performance benchmarks for each model |
75 | | -- [✅] Improve PyPI discoverability (added tags, keywords, docs) |
76 | | -- [ ] Create a Streamlit or Gradio live demo |
77 | | - |
78 | | ---- |
79 | | - |
80 | | -## 🧠 8. New Survival Models to Implement |
81 | | - |
82 | | -- [✅] Log-Normal AFT |
83 | | -- [✅] Log-Logistic AFT |
84 | | -- [✅] Weibull AFT |
85 | | -- [✅] Piecewise Exponential |
86 | | -- [✅] Competing Risks |
87 | | -- [ ] Recurrent Events |
88 | | -- [✅] Mixture Cure Model |
89 | | - |
90 | | ---- |
91 | | - |
92 | | -## 🧬 9. Advanced Data Simulation Features |
93 | | - |
94 | | -- [ ] Recurrent events (multiple events per individual) |
95 | | -- [ ] Frailty models (random effects) |
96 | | -- [ ] Time-varying hazard functions |
97 | | -- [ ] Multi-line start-stop formatted data |
98 | | -- [ ] Competing risks with cause-specific hazards |
99 | | -- [ ] Simulate violations of PH assumption |
100 | | -- [ ] Grouped / clustered data generation |
101 | | -- [ ] Mixed covariates: categorical, continuous, binary |
102 | | -- [ ] Joint models (longitudinal + survival outcome) |
103 | | -- [ ] Controlled scenarios for robustness tests |
| 65 | +For a detailed history of past releases, please see our [CHANGELOG.md](CHANGELOG.md). |
0 commit comments