Skip to content

Implement IBS via RiskRegression #20

@bblodfon

Description

@bblodfon

Now we have msr("cmprsk.brier", time = ...) for a specific time point. We need msr("cmprsk.ibs", times = ...) and need to think about which times to take (and from which set: train/test/mix) if the user doesn't provide this parameter - possibly make use of the ntimes argument, t_max, etc. Other arguments (cause, cause_weights) stay the same.

Let's think of one cause here (cause-specific) and an implementation that is fresh and not restrictive due to using interfaces from other packages (this is very relevant for implementing IBS for mlr3survival as well).

So a CIF matrix comes with a predefined train-set-based time grid (anchors) => $t_1,t_2,...,t_B$. Now we need to define the integration time grid for IBS (can be different time points from the anchors) => this is the essense of why we need to define the times argument. Cases:

  1. If times is given (length(times) > 2), use that (interpolate CIFs as necessary using survdistr::interp_cif()). If any times > t_B we should give a warning about extrapolation.
  2. If no times is given, we can get $t_1$ and $t_B$ from the anchors (which define the time range from the train set usually), use the ntimes argument (default 50) to spead out evenly the integration time grid for IBS. See also Add restriction for unique.death.times imbs-hl/ranger#410 (comment) and Time-dependent C-index bcjaeger/aorsf-bench#6 (for a solution that is based on the event quantiles). Suggestions?
  3. If t_max is defined, we can filter up to t_max the times from case 1 or 2 above (we don't need to include t_max itself I guess).
  4. p_max could also be added (finds the t_max that results in 80% censoring rate in the dataset for example - this requires the whole task, i.e. both train+test set when using the $score(..., task = ...))? see doc

Now the problem using riskRegression::Score(list(CIF_matrix), data, times, ...) is that it expects:

  1. nrow(CIF_matrix) == nrow(data) => this forces data to be the (times, event) outcomes from the test set. And these are used for the IPCW calculation. So IPCW only from the test set by default! (as the CIF_matrix has the test set observations as the rows)
  2. ncols(CIF_matrix) == length(times) => this forces the times to be already interpolated in the CIF matrix (and have 1-1 correspondence)
  3. If any of the times is larger than max(data$times) (max test time point), they are automatically removed, causing the No. 2 error above!

So probably to make things work with riskRegression we need to take the integration time grid from the test set and confine it within $[0,t_{max}^{test}]$ (even if times if given).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions