surtvep is an R package for fitting Cox non-proportional hazards models with time-varying coefficients. Both unpenalized procedures (Newton and proximal Newton) and penalized procedures (P-splines and smoothing splines) are included using B-spline basis functions for estimating time-varying coefficients.
For penalized procedures, cross-validations, mAIC, TIC or GIC are implemented to select tuning parameters. Utilities for carrying out post-estimation visualization, summarization, point-wise confidence interval and hypothesis testing are also provided.
Large-scale time-to-event data derived from national disease registries arise rapidly in medical studies. Detecting and accounting for time-varying effects is particularly important, as time-varying effects have already been reported in the clinical literature. However, there are currently no formal R packages for estimating the time-varying effects without pre-assuming the time-dependent function. Inaccurate pre-assumptions can greatly influence the estimation, leading to unreliable results. To address this issue, we developed a time-varying model using spline terms with penalization that does not require pre-assumption of the true time-dependent function, and implemented it in R.
Our package offers several benefits over traditional methods. Firstly, traditional methods for modeling time-varying survival models often rely on expanding the original data into a repeated measurement format. However, even with moderate sample sizes, this leads to a large and computationally burdensome working dataset. Our package addresses this issue by proposing a computationally efficient Kronecker product-based proximal algorithm, which allows for the evaluation of time-varying effects in large-scale studies. Additionally, our package allows for parallel computing and can handle moderate to large sample sizes more efficiently than current methods.
In our statistical software tutorial, we address a common issue encountered when analyzing data with binary covariates with near-zero variation. For example, in the SEER prostate cancer data, only 0.6% of the 716,553 patients had their tumors regional to the lymph nodes. In such cases, the associated observed information matrix of a Newton-type method may have a minimum eigenvalue close to zero and a large condition number. Inverting this nearly singular matrix can lead to numerical instability and the corresponding Newton updates may be confined within a small neighborhood of the initial value, resulting in estimates that are far from the optimal solutions. To address this problem, our proposed Proximal-Newtown method utilizes a modified Hessian matrix, which allows for accurate estimation in these scenarios.
| Method | Description | Example |
|---|---|---|
| Newton | Newton's method and Proximal Newton's method [1]. | tutorial |
| Newton's method with penalization | Newton's method and Proximal Newton combined with P-spline or Smoothing-spline [2]. | tutorial |
| Method | Description | Example |
|---|---|---|
| mAIC | A modified Akaki Information Criterion [1]. | tutorial |
| TIC | Takuchi Information Criterion [1]. | tutorial |
| GIC | Takuchi Information Criterion [1]. | tutorial |
| Cross Validation | Use cross validation to select the penalization coefficient [1]. | tutorial |
Here, we are using the Simulation study included in our packages as an example
library(surtvep)
#Load Simulation study
sim_data=sim_data
#Clean and create label and covariate matrix for the package:
event=sim_data[,"event"]
time=sim_data[,"time"]
data=sim_data[,!colnames(sim_data) %in% c("event","time")]
#Fit the model(Time varying model without penalty)
fit <- coxtp(event = event, z = data, time = time)
coxtp.plot(fit,coef="V1")
The SUPPORT dataset is available in the "surtvep" package. The following code will load the dataset in the form of a dataframe
data("support")
| Dataset | Size | Dataset |
|---|---|---|
| ExampleData | 4,000 | A simulated data set containing 2 continuous variables. |
| ExampleDataBinary | 2,000 | A simulated data set containing 2 binary variables. |
| StrataExample | 2,000 | A simulated data set containing 2 binary variables. Subjects in different strata have |
| Dataset | Size | Dataset | Data source |
|---|---|---|---|
| SUPPORT | The support dataset is a random sample of 1000 patients from Phases I & II of SUPPORT (Study to Understand Prognoses Preferences Outcomes and Risks of Treatment). This dataset is very good for learning how to fit highly nonlinear predictor effects. See tutorial | source |
Note: This package is still in its early stages of development, so please don't hesitate to report any problems you may experience.
The package only works for R 4.1.0+.
You can install 'surtvep' via:
#Install the package, need to install the devtools packages:
install.packages("devtools")
require("remotes")
remotes::install_github("UM-KevinHe/surtvep", ref = "openmp")
We recommand to start with tutorial, as it provides an overview of the package's usage, including preprocessing, model training, selection of penalization parameters, and post-estimation procedures.
For detailed tutorial and model paramter explaination, please go to here
If you encounter any problems or bugs, please contact us at: lfluo@umich.edu{.email}, xuetao@umich.edu{.email}
[1] Wenbo Wu, Jeremy M G Taylor, Andrew F Brouwer, Lingfeng Luo, Jian Kang, Hui Jiang and Kevin He. Scalable proximal Methods for cause-specific hazard modeling with time-varying coefficients. Lifetime Data Analysis, 28(2):194-218, 2022. [paper]