Description
Sometimes it would be useful to specify how parameters are changed inside a Learner
/ PipeOp
, e.g. as in mlr-org/mlr3pipelines#24. A typical example is the mtry
parameter of a random forest, which should range from 1 to task$ncol
. It would be nice if one could introduce an mtry.pexp
parameter ranging from 0 to 1, so that the actual mtry
is set to round(task$ncol ^ mtry.pexp)
.
The $trafo
function, as it currently stands, is not a good fit for this, because it (1) operates before the Learner
even sees the Task
, so wouldn't know about task$ncol
, and (2) would not be able to introduce a new parameter mtry.pexp
, it would only be able to re-scale the present mtry
, which is an integer between 1 and Inf
, not a real number between 0 and 1.
I think the following UI would be quite nice:
lrn = mlr_learners$get("classif.ranger")
ps = lrn$param_set$clone()
ps$subset(setdiff(ps$ids(), "mtry"))
ps$add(ParamDbl$new("mtry.pexp", 0, 1))
ps$trafo = function(x, env, param_set) {
x$mtry = round(env$task$ncol ^ x$mtry.pexp)
x$mtry.pexp = NULL
x
}
lrn$param_set$add_interface(ps) # !!
# set effective `mtry` to `round(ncol(task) ^ 0.7)` when training happens
lrn$param_set$values$mtry.pexp = 0.7
lrn$param_set$values$mtry = 3 # ERROR
This would change the lrn$param_set
to "look and feel" like the ps
constructed / modified before, but internally the Learner
(or e.g. a PipeOp
) would get the parameter values as performed by the $trafo
function.
A way to implement this would be the following:
- Add a
private$.learnerside = NULL
slot that points to theParamSet
that theLearner
/PipeOp
should see. - Add a
$has_interface
active binding:has_interface = function() !is.null(private$.learnerside)
- Add a
self$learnerside(last = TRUE)
function that gives theParamSet
that theLearner
/PipeOp
should see. Becauseprivate$.learnerside
could point to aParamSet
that itself has aprivate$.learnerside
set, it should be recursive iflast
isTRUE
, and only give the "next"learnerside
iflast
isFALSE
.learnerside = function(last = TRUE) { if (!self$has_interface) return(self) if (last) { private$.learnerside$learnerside(last = TRUE) } else { private$.learnerside } }
- Implement a
private$copy_param_set()
helper function. It copies all relevant items from its argument to theParamSet
itself, to turn theself
into an effective copy of that argument:copy_param_set = function(param_set) { private$.params = param_set$params private$.deps = param_set$deps private$.values = param_set$values private$.trafo = param_set$trafo invisible(self) }
- Implement the public
$add_interface()
function:add_interface = function(param_set) { private$.learnerside = self$clone(deep = TRUE) private$copy_param_set(param_set) }
- Implement a public
$remove_interface()
function:remove_interface = function(param_set, all = FALSE) { if (!self$has_interface) stop("no interface to remove") replace_with = self$learnerside(last = all) private$copy_param_set(replace_with) private$.learnerside = replace_with$.learnerside }
- How does the
Learner
/PipeOp
get its value out of this? There probably should be a$get_values()
function that gets the values for the operation, which should also have the filter functionality thatids
currently has.get_values = function(class = NULL, tags = NULL, learnerside = FALSE, env) { if (learnerside && self$has_interface) { private$.learnerside$values = self$trafo(self$values, env) return(private$.learnerside$get_values( class = class, tags = tags, learnerside = learnerside, env = env )) } values = self$values values[intersect(names(values), self$ids(class = class, tags = tags))] }
- Change the
trafo
active binding to also accept functions of the formfunction(x, env)
This implementation has the advantage that multiple interfaces can be "stacked" on top of each other: A user who gets a Learner
does not need to know or care if something put an interface in front of its ParamSet
. When the user sets a parameter using param_set$values$param = x
, the value gets checked against the constraints of the interface parameter set. When he calls lrn$train()
, the train()
function calls get_values(tags = "train", learnerside = TRUE, env = list(task = task))
, which recurses through the different interfaces that were added, and sets $values
in each one of them after transforming. This automatically checks that the trafo function returns a feasible value for the original ParamSet
.
This change would also be completely transparent to everything ParamSet
is doing so far.
Things that I am not sure about:
- It is a bit inelegant to have the
env
parameter depend on what kind of object theParamSet
belongs to: SomePipeOps
(e.g.PipeOpModelAvg
) have parameters in a different context, where notask
is present (and instead maybe aprediction
). One would probably want to agree on an interface (alwaystask
in aLearner
/ preprocessingPipeOp
, alwaysprediction
in a "post-processing"PipeOp
, other contexts..?) - There are no checks on the feasibility of the trafo function output until the actual training / predicting happens.
- Maybe one still wants to use the
"train"
/"predict"
tags from the outside, e.g. maybe a tuning algorithm wants to train a model with one set of"train"
parameters and then evaluate these with different"predict"
parameters to get multiple performance datapoints with only a singletrain()
call for efficiency. In that case it would be nice if thetrafo
could also respect the"train"
/"predict"
tags and work when only a subset of parameter values is present. In that case, theget_values
would need to be adapted to only giveself$values[intersect(names(self$values), set$ids(...tags = tags))]
toself$trafo
. - I don't know if it would be useful to do this for
ParamSetCollection
. Maybe aGraphLearner
would want to have an interface as well? I wouldn't know what the UI for that would look like, however. In that case it would probably be easiest to intervene with the individualPipeOp
s'ParamSet
.
Activity