Skip to content

Qinmengge/surtvep

 
 

Repository files navigation

surtvep

surtvep is an R package for fitting Cox non-proportional hazards models with time-varying coefficients. Both unpenalized procedures (Newton and proximal Newton) and penalized procedures (P-splines and smoothing splines) are included using B-spline basis functions for estimating time-varying coefficients. For penalized procedures, cross-validations, mAIC, TIC or GIC are implemented to select tuning parameters. Utilities for carrying out post-estimation visualization, summarization, point-wise confidence interval and hypothesis testing are also provided.

Introduction

Large-scale time-to-event data derived from national disease registries arise rapidly in medical studies. Detecting and accounting for time-varying effects is particularly important, as time-varying effects have already been reported in the clinical literature. However, there are currently no formal R packages for estimating the time-varying effects without pre-assuming the time-dependent function. Inaccurate pre-assumptions can greatly influence the estimation, leading to unreliable results. To address this issue, we developed a time-varying model using spline terms with penalization that does not require pre-assumption of the true time-dependent function, and implemented it in R.

Our package offers several benefits over traditional methods. Firstly, traditional methods for modeling time-varying survival models often rely on expanding the original data into a repeated measurement format. However, even with moderate sample sizes, this leads to a large and computationally burdensome working dataset. Our package addresses this issue by proposing a computationally efficient Kronecker product-based proximal algorithm, which allows for the evaluation of time-varying effects in large-scale studies. Additionally, our package allows for parallel computing and can handle moderate to large sample sizes more efficiently than current methods.

In our statistical software tutorial, we address a common issue encountered when analyzing data with binary covariates with near-zero variation. For example, in the SEER prostate cancer data, only 0.6% of the 716,553 patients had their tumors regional to the lymph nodes. In such cases, the associated observed information matrix of a Newton-type method may have a minimum eigenvalue close to zero and a large condition number. Inverting this nearly singular matrix can lead to numerical instability and the corresponding Newton updates may be confined within a small neighborhood of the initial value, resulting in estimates that are far from the optimal solutions. To address this problem, our proposed Proximal-Newtown method utilizes a modified Hessian matrix, which allows for accurate estimation in these scenarios.

Models:

Method Description Example
Newton Newton's method and Proximal Newton's method [1]. tutorial
Newton's method with penalization Newton's method and Proximal Newton combined with P-spline or Smoothing-spline [2]. tutorial

Penalzation Coefficient Selection Methods:

Method Description Example
mAIC A modified Akaki Information Criterion [1]. tutorial
TIC Takuchi Information Criterion [1]. tutorial
GIC Takuchi Information Criterion [1]. tutorial
Cross Validation Use cross validation to select the penalization coefficient [1]. tutorial

Usage:

Here, we are using the Simulation study included in our packages as an example

library(surtvep)

#Load Simulation study
sim_data=sim_data
#Clean and create label and covariate matrix for the package:
event=sim_data[,"event"]
time=sim_data[,"time"]
data=sim_data[,!colnames(sim_data) %in% c("event","time")]

#Fit the model(Time varying model without penalty)

fit <- coxtp(event = event, z = data, time = time)
coxtp.plot(fit,coef="V1")

Datasets

The SUPPORT dataset is available in the "surtvep" package. The following code will load the dataset in the form of a dataframe

data("support")

Simulated Datasets:

Dataset Size Dataset
ExampleData 4,000 A simulated data set containing 2 continuous variables.
ExampleDataBinary 2,000 A simulated data set containing 2 binary variables.
StrataExample 2,000 A simulated data set containing 2 binary variables. Subjects in different strata have

Real Datasets:

for preprocessing.
Dataset Size Dataset Data source
SUPPORT The support dataset is a random sample of 1000 patients from Phases I & II of SUPPORT (Study to Understand Prognoses Preferences Outcomes and Risks of Treatment). This dataset is very good for learning how to fit highly nonlinear predictor effects. See tutorial source

Installation

Note: This package is still in its early stages of development, so please don't hesitate to report any problems you may experience.

The package only works for R 4.1.0+.

You can install 'surtvep' via:

#Install the package, need to install the devtools packages:
install.packages("devtools")
require("remotes")
remotes::install_github("UM-KevinHe/surtvep", ref = "openmp")

We recommand to start with tutorial, as it provides an overview of the package's usage, including preprocessing, model training, selection of penalization parameters, and post-estimation procedures.

Detailed tutorial

For detailed tutorial and model paramter explaination, please go to here

Getting Help:

If you encounter any problems or bugs, please contact us at: lfluo@umich.edu{.email}, xuetao@umich.edu{.email}

References

[1] Wenbo Wu, Jeremy M G Taylor, Andrew F Brouwer, Lingfeng Luo, Jian Kang, Hui Jiang and Kevin He. Scalable proximal Methods for cause-specific hazard modeling with time-varying coefficients. Lifetime Data Analysis, 28(2):194-218, 2022. [paper]

About

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 75.0%
  • R 24.9%
  • TeX 0.1%