- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 7
 
Description
Sometimes it would be useful to specify how parameters are changed inside a Learner / PipeOp, e.g. as in mlr-org/mlr3pipelines#24. A typical example is the mtry parameter of a random forest, which should range from 1 to task$ncol. It would be nice if one could introduce an mtry.pexp parameter ranging from 0 to 1, so that the actual mtry is set to round(task$ncol ^ mtry.pexp).
The $trafo function, as it currently stands, is not a good fit for this, because it (1) operates before the Learner even sees the Task, so wouldn't know about task$ncol, and (2) would not be able to introduce a new parameter mtry.pexp, it would only be able to re-scale the present mtry, which is an integer between 1 and Inf, not a real number between 0 and 1.
I think the following UI would be quite nice:
lrn = mlr_learners$get("classif.ranger")
ps = lrn$param_set$clone()
ps$subset(setdiff(ps$ids(), "mtry"))
ps$add(ParamDbl$new("mtry.pexp", 0, 1))
ps$trafo = function(x, env, param_set) {
  x$mtry = round(env$task$ncol ^ x$mtry.pexp)
  x$mtry.pexp = NULL
  x
}
lrn$param_set$add_interface(ps)  # !!
# set effective `mtry` to `round(ncol(task) ^ 0.7)` when training happens
lrn$param_set$values$mtry.pexp = 0.7
lrn$param_set$values$mtry = 3 # ERRORThis would change the lrn$param_set to "look and feel" like the ps constructed / modified before, but internally the Learner (or e.g. a PipeOp) would get the parameter values as performed by the $trafo function.
A way to implement this would be the following:
- Add a 
private$.learnerside = NULLslot that points to theParamSetthat theLearner/PipeOpshould see. - Add a 
$has_interfaceactive binding:has_interface = function() !is.null(private$.learnerside)
 - Add a 
self$learnerside(last = TRUE)function that gives theParamSetthat theLearner/PipeOpshould see. Becauseprivate$.learnersidecould point to aParamSetthat itself has aprivate$.learnersideset, it should be recursive iflastisTRUE, and only give the "next"learnersideiflastisFALSE.learnerside = function(last = TRUE) { if (!self$has_interface) return(self) if (last) { private$.learnerside$learnerside(last = TRUE) } else { private$.learnerside } }
 - Implement a 
private$copy_param_set()helper function. It copies all relevant items from its argument to theParamSetitself, to turn theselfinto an effective copy of that argument:copy_param_set = function(param_set) { private$.params = param_set$params private$.deps = param_set$deps private$.values = param_set$values private$.trafo = param_set$trafo invisible(self) }
 - Implement the public 
$add_interface()function:add_interface = function(param_set) { private$.learnerside = self$clone(deep = TRUE) private$copy_param_set(param_set) }
 - Implement a public 
$remove_interface()function:remove_interface = function(param_set, all = FALSE) { if (!self$has_interface) stop("no interface to remove") replace_with = self$learnerside(last = all) private$copy_param_set(replace_with) private$.learnerside = replace_with$.learnerside }
 - How does the 
Learner/PipeOpget its value out of this? There probably should be a$get_values()function that gets the values for the operation, which should also have the filter functionality thatidscurrently has.get_values = function(class = NULL, tags = NULL, learnerside = FALSE, env) { if (learnerside && self$has_interface) { private$.learnerside$values = self$trafo(self$values, env) return(private$.learnerside$get_values( class = class, tags = tags, learnerside = learnerside, env = env )) } values = self$values values[intersect(names(values), self$ids(class = class, tags = tags))] }
 - Change the 
trafoactive binding to also accept functions of the formfunction(x, env) 
This implementation has the advantage that multiple interfaces can be "stacked" on top of each other: A user who gets a Learner does not need to know or care if something put an interface in front of its ParamSet. When the user sets a parameter using param_set$values$param = x, the value gets checked against the constraints of the interface parameter set. When he calls lrn$train(), the train() function calls get_values(tags = "train", learnerside = TRUE, env = list(task = task)), which recurses through the different interfaces that were added, and sets $values in each one of them after transforming. This automatically checks that the trafo function returns a feasible value for the original ParamSet.
This change would also be completely transparent to everything ParamSet is doing so far.
Things that I am not sure about:
- It is a bit inelegant to have the 
envparameter depend on what kind of object theParamSetbelongs to: SomePipeOps(e.g.PipeOpModelAvg) have parameters in a different context, where notaskis present (and instead maybe aprediction). One would probably want to agree on an interface (alwaystaskin aLearner/ preprocessingPipeOp, alwayspredictionin a "post-processing"PipeOp, other contexts..?) - There are no checks on the feasibility of the trafo function output until the actual training / predicting happens.
 - Maybe one still wants to use the 
"train"/"predict"tags from the outside, e.g. maybe a tuning algorithm wants to train a model with one set of"train"parameters and then evaluate these with different"predict"parameters to get multiple performance datapoints with only a singletrain()call for efficiency. In that case it would be nice if thetrafocould also respect the"train"/"predict"tags and work when only a subset of parameter values is present. In that case, theget_valueswould need to be adapted to only giveself$values[intersect(names(self$values), set$ids(...tags = tags))]toself$trafo. - I don't know if it would be useful to do this for 
ParamSetCollection. Maybe aGraphLearnerwould want to have an interface as well? I wouldn't know what the UI for that would look like, however. In that case it would probably be easiest to intervene with the individualPipeOps'ParamSet.