Implement IBS via `RiskRegression`

Now we have `msr("cmprsk.brier", time = ...)` for a specific time point. We need `msr("cmprsk.ibs", times = ...)` and need to think about which `times` to take (and from which set: train/test/mix) if the user doesn't provide this parameter - possibly make use of the `ntimes` argument, `t_max`, etc. [Other arguments](https://mlr3cmprsk.mlr-org.com/reference/mlr_measures_cmprsk.brier.html) (`cause`, `cause_weights`) stay the same.

Let's think of one cause here (cause-specific) and an implementation that is fresh and not restrictive due to using interfaces from other packages (this is very relevant for implementing IBS for `mlr3survival` as well).

So a CIF matrix comes with a predefined train-set-based time grid (*anchors*) => $t_1,t_2,...,t_B$. Now we need to define the integration time grid for IBS (can be different time points from the anchors) => this is the essense of why we need to define the `times` argument. Cases:

1. If `times` is given (`length(times) > 2`), use that (interpolate CIFs as necessary using `survdistr::interp_cif()`). If any `times > t_B` we should give a warning about extrapolation.
2. If no `times` is given, we can get $t_1$ and $t_B$ from the `anchors` (which define the time range from the train set usually), use the `ntimes`  argument (default `50`) to spead out evenly the integration time grid for IBS. See also https://github.com/imbs-hl/ranger/issues/410#issuecomment-1756745869 and https://github.com/bcjaeger/aorsf-bench/issues/6 (for a solution that is based on the event quantiles). Suggestions?
3. If `t_max` is defined, we can filter up to `t_max` the `times` from case 1 or 2 above (we don't need to include `t_max` itself I guess).
4. `p_max` could also be added (finds the `t_max` that results in 80% censoring rate in the dataset for example - this requires the whole `task`, i.e. both train+test set when using the `$score(..., task = ...)`)? see [doc](https://mlr3proba.mlr-org.com/reference/mlr_measures_surv.graf.html#parameter-details)

Now the problem using `riskRegression::Score(list(CIF_matrix), data, times, ...)` is that it expects:
 
1. `nrow(CIF_matrix) == nrow(data)` => this forces data to be the (times, event) outcomes from the test set. And these are used for the IPCW calculation. So IPCW only from the test set by default! (as the CIF_matrix has the test set observations as the rows)
2. `ncols(CIF_matrix) == length(times)` => this forces the `times` to be already interpolated in the CIF matrix (and have 1-1 correspondence)
3. If any of the `times` is larger than `max(data$times)` (max test time point), they are automatically removed, causing the No. 2 error above!

So probably to make things work with `riskRegression` we need to take the integration time grid from the test set and confine it within $[0,t_{max}^{test}]$ (even if `times` if given).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement IBS via `RiskRegression` #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement IBS via RiskRegression #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement IBS via `RiskRegression` #20