Parameter transformations inside ParamSet

Sometimes it would be useful to specify how parameters are changed inside a `Learner` / `PipeOp`, e.g. as in mlr-org/mlr3pipelines#24. A typical example is the `mtry` parameter of a random forest, which should range from 1 to `task$ncol`. It would be nice if one could introduce an `mtry.pexp` parameter ranging from 0 to 1, so that the actual `mtry` is set to `round(task$ncol ^ mtry.pexp)`.

The `$trafo` function, as it currently stands, is not a good fit for this, because it (1) operates before the `Learner` even sees the `Task`, so wouldn't know about `task$ncol`, and (2) would not be able to introduce a new parameter `mtry.pexp`, it would only be able to re-scale the present `mtry`, which is an integer between 1 and `Inf`, not a real number between 0 and 1.

I think the following UI would be quite nice:
```r
lrn = mlr_learners$get("classif.ranger")
ps = lrn$param_set$clone()
ps$subset(setdiff(ps$ids(), "mtry"))
ps$add(ParamDbl$new("mtry.pexp", 0, 1))
ps$trafo = function(x, env, param_set) {
  x$mtry = round(env$task$ncol ^ x$mtry.pexp)
  x$mtry.pexp = NULL
  x
}

lrn$param_set$add_interface(ps)  # !!

# set effective `mtry` to `round(ncol(task) ^ 0.7)` when training happens
lrn$param_set$values$mtry.pexp = 0.7

lrn$param_set$values$mtry = 3 # ERROR
```
This would change the `lrn$param_set` to "look and feel" like the `ps` constructed / modified before, but internally the `Learner` (or e.g. a `PipeOp`) would get the parameter values as performed by the `$trafo` function.

A way to implement this would be the following:
1. Add a `private$.learnerside = NULL` slot that points to the `ParamSet` that the `Learner` / `PipeOp` should see.
1. Add a `$has_interface` active binding:
    ```r
    has_interface = function() !is.null(private$.learnerside)
1. Add a `self$learnerside(last = TRUE)` function that gives the `ParamSet` that the `Learner` / `PipeOp` should see. Because `private$.learnerside` could point to a `ParamSet` that *itself* has a `private$.learnerside` set, it should be recursive if `last` is `TRUE`, and only give the "next" `learnerside` if `last` is `FALSE`.
    ```r
    learnerside = function(last = TRUE) {
      if (!self$has_interface)
        return(self)
      if (last) {
        private$.learnerside$learnerside(last = TRUE)
      } else {
        private$.learnerside
      }
    }
    ```
1. Implement a `private$copy_param_set()` helper function. It copies all relevant items from its argument to the `ParamSet` itself, to turn the `self` into an effective copy of that argument:
    ```r
    copy_param_set = function(param_set) {
      private$.params = param_set$params
      private$.deps = param_set$deps
      private$.values = param_set$values
      private$.trafo = param_set$trafo
      invisible(self)
    }
    ```
1. Implement the public `$add_interface()` function:
    ```r
    add_interface = function(param_set) {
      private$.learnerside = self$clone(deep = TRUE)
      private$copy_param_set(param_set)
    }
    ```
1. Implement a public `$remove_interface()` function:
    ```r
    remove_interface = function(param_set, all = FALSE) {
      if (!self$has_interface)
        stop("no interface to remove")
      replace_with = self$learnerside(last = all)
      private$copy_param_set(replace_with)
      private$.learnerside = replace_with$.learnerside
    }
    ```
1. How does the `Learner` / `PipeOp` get its value out of this? There probably should be a `$get_values()` function that gets the values for the operation, which should also have the filter functionality that `ids` currently has.
    ```r
    get_values = function(class = NULL, tags = NULL, learnerside = FALSE, env) {
      if (learnerside && self$has_interface) {
        private$.learnerside$values = self$trafo(self$values, env)
        return(private$.learnerside$get_values(
          class = class, tags = tags, learnerside = learnerside, env = env
        ))
      }
      values = self$values
      values[intersect(names(values), self$ids(class = class, tags = tags))]
    }
    ```
1. Change the `trafo` active binding to also accept functions of the form `function(x, env)`

This implementation has the advantage that multiple interfaces can be "stacked" on top of each other: A user who gets a `Learner` does not need to know or care if something put an interface in front of its `ParamSet`. When the user sets a parameter using `param_set$values$param = x`, the value gets checked against the constraints of the interface parameter set. When he calls `lrn$train()`, the `train()` function calls `get_values(tags = "train", learnerside = TRUE, env = list(task = task))`, which recurses through the different interfaces that were added, and sets `$values` in each one of them after transforming. This automatically checks that the trafo function returns a feasible value for the original `ParamSet`.

*This change would also be completely transparent to everything `ParamSet` is doing so far.*

Things that I am not sure about:
* It is a bit inelegant to have the `env` parameter depend on what kind of object the `ParamSet` belongs to: Some `PipeOps` (e.g. `PipeOpModelAvg`) have parameters in a different context, where no `task` is present (and instead maybe a `prediction`). One would probably want to agree on an interface (always `task` in a `Learner` / preprocessing `PipeOp`, always `prediction` in a "post-processing" `PipeOp`, other contexts..?)
* There are no checks on the feasibility of the trafo function output until the actual training / predicting happens.
* Maybe one still wants to use the `"train"` / `"predict"` tags from the outside, e.g. maybe a tuning algorithm wants to train a model with one set of `"train"` parameters and then evaluate these with different `"predict"` parameters to get multiple performance datapoints with only a single `train()` call for efficiency. In that case it would be nice if the `trafo` could also respect the `"train"` / `"predict"` tags and work when only a subset of parameter values is present. In that case, the `get_values` would need to be adapted to only give `self$values[intersect(names(self$values), set$ids(...tags = tags))]` to `self$trafo`.
* I don't know if it would be useful to do this for `ParamSetCollection`. Maybe a `GraphLearner` would want to have an interface as well? I wouldn't know what the UI for that would look like, however. In that case it would probably be easiest to intervene with the individual `PipeOp`s' `ParamSet`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parameter transformations inside ParamSet #215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parameter transformations inside ParamSet #215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions