You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now we have msr("cmprsk.brier", time = ...) for a specific time point. We need msr("cmprsk.ibs", times = ...) and need to think about which times to take (and from which set: train/test/mix) if the user doesn't provide this parameter - possibly make use of the ntimes argument, t_max, etc. Other arguments (cause, cause_weights) stay the same.
Let's think of one cause here (cause-specific) and an implementation that is fresh and not restrictive due to using interfaces from other packages (this is very relevant for implementing IBS for mlr3survival as well).
So a CIF matrix comes with a predefined train-set-based time grid (anchors) => $t_1,t_2,...,t_B$. Now we need to define the integration time grid for IBS (can be different time points from the anchors) => this is the essense of why we need to define the times argument. Cases:
If times is given (length(times) > 2), use that (interpolate CIFs as necessary using survdistr::interp_cif()). If any times > t_B we should give a warning about extrapolation.
If t_max is defined, we can filter up to t_max the times from case 1 or 2 above (we don't need to include t_max itself I guess).
p_max could also be added (finds the t_max that results in 80% censoring rate in the dataset for example - this requires the whole task, i.e. both train+test set when using the $score(..., task = ...))? see doc
Now the problem using riskRegression::Score(list(CIF_matrix), data, times, ...) is that it expects:
nrow(CIF_matrix) == nrow(data) => this forces data to be the (times, event) outcomes from the test set. And these are used for the IPCW calculation. So IPCW only from the test set by default! (as the CIF_matrix has the test set observations as the rows)
ncols(CIF_matrix) == length(times) => this forces the times to be already interpolated in the CIF matrix (and have 1-1 correspondence)
If any of the times is larger than max(data$times) (max test time point), they are automatically removed, causing the No. 2 error above!
So probably to make things work with riskRegression we need to take the integration time grid from the test set and confine it within $[0,t_{max}^{test}]$ (even if times if given).
Now we have
msr("cmprsk.brier", time = ...)for a specific time point. We needmsr("cmprsk.ibs", times = ...)and need to think about whichtimesto take (and from which set: train/test/mix) if the user doesn't provide this parameter - possibly make use of thentimesargument,t_max, etc. Other arguments (cause,cause_weights) stay the same.Let's think of one cause here (cause-specific) and an implementation that is fresh and not restrictive due to using interfaces from other packages (this is very relevant for implementing IBS for
mlr3survivalas well).So a CIF matrix comes with a predefined train-set-based time grid (anchors) =>$t_1,t_2,...,t_B$ . Now we need to define the integration time grid for IBS (can be different time points from the anchors) => this is the essense of why we need to define the
timesargument. Cases:timesis given (length(times) > 2), use that (interpolate CIFs as necessary usingsurvdistr::interp_cif()). If anytimes > t_Bwe should give a warning about extrapolation.timesis given, we can getanchors(which define the time range from the train set usually), use thentimesargument (default50) to spead out evenly the integration time grid for IBS. See also Add restriction for unique.death.times imbs-hl/ranger#410 (comment) and Time-dependent C-index bcjaeger/aorsf-bench#6 (for a solution that is based on the event quantiles). Suggestions?t_maxis defined, we can filter up tot_maxthetimesfrom case 1 or 2 above (we don't need to includet_maxitself I guess).p_maxcould also be added (finds thet_maxthat results in 80% censoring rate in the dataset for example - this requires the wholetask, i.e. both train+test set when using the$score(..., task = ...))? see docNow the problem using
riskRegression::Score(list(CIF_matrix), data, times, ...)is that it expects:nrow(CIF_matrix) == nrow(data)=> this forces data to be the (times, event) outcomes from the test set. And these are used for the IPCW calculation. So IPCW only from the test set by default! (as the CIF_matrix has the test set observations as the rows)ncols(CIF_matrix) == length(times)=> this forces thetimesto be already interpolated in the CIF matrix (and have 1-1 correspondence)timesis larger thanmax(data$times)(max test time point), they are automatically removed, causing the No. 2 error above!So probably to make things work with$[0,t_{max}^{test}]$ (even if
riskRegressionwe need to take the integration time grid from the test set and confine it withintimesif given).