feat: add mirai parallelization #1314

be-marc · 2025-05-22T10:55:57Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for running experiments in parallel using the mirai package and exposes a new "mirai" option in the learner encapsulation workflow.

Introduces a mirai branch in future_map to dispatch work via mirai_map/collect_mirai.
Updates Learner$encapsulate() to accept "mirai" as a method choice.
Adds mirai to DESCRIPTION Suggestions and configures a GitHub remote; updates .Rbuildignore.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File	Description
R/helper_exec.R	Added `mirai` branch in `future_map` with `mirai_map`/`collect_mirai`.
R/Learner.R	Extended `encapsulate()` to accept `"mirai"` method.
DESCRIPTION	Added `mirai` to Suggests and configured a Remotes entry.
.Rbuildignore	Added `^attic$` to ignore list.

Comments suppressed due to low confidence (5)

R/helper_exec.R:26

There's no test for the new mirai branch in future_map. Add a unit test that simulates an available mirai environment and verifies that mirai_map/collect_mirai is invoked.

} else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {

R/Learner.R:556

The roxygen block for encapsulate() wasn't updated to mention the new mirai method. Include a description of what mirai does and any additional arguments it requires.

#' @return `self` (invisibly).

R/helper_exec.R:28

The code calls workhorse inside mirai_map, but FUN is the user-supplied function parameter. Replace workhorse with FUN (or clarify where workhorse is defined) to ensure the intended function is executed.

mirai::collect_mirai(mirai::mirai_map(data.table(...), workhorse, .args = c(MoreArgs, list(is_sequential = FALSE))))

R/Learner.R:557

You’ve added "mirai" as a valid choice, but there is no corresponding branch handling method == "mirai" in encapsulate(). Implement the execution logic for the new method or remove it until ready.

assert_choice(method, c("none", "try", "evaluate", "callr", "mirai"))

DESCRIPTION:78

The Remotes entry points to mlr-org/mlr3misc@mirai, which appears unrelated to the mirai package. Update this to reference the actual mirai repository (or remove if mirai is on CRAN).

    mlr-org/mlr3misc@mirai

inst/testthat/helper_misc.R

mb706 · 2025-05-22T12:20:14Z

R/helper_exec.R

@@ -23,6 +23,9 @@ future_map = function(n, FUN, ..., MoreArgs = list()) {
  if (getOption("mlr3.debug", FALSE)) {
    lg$info("Running experiments sequentially in debug mode with %i iterations", n)
    mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+  } else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {


so this will automatically prefer mirai over future? Maybe there should be an option for this

Only if the user also started a daemon with mirai::daemons()

mb706 · 2025-05-22T12:21:29Z

R/helper_exec.R

@@ -23,6 +23,9 @@ future_map = function(n, FUN, ..., MoreArgs = list()) {
  if (getOption("mlr3.debug", FALSE)) {
    lg$info("Running experiments sequentially in debug mode with %i iterations", n)
    mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+  } else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {
+    lg$debug("Running resample() via mirai with %i iterations", n)
+    mirai::collect_mirai(mirai::mirai_map(data.table(...), FUN, .args = c(MoreArgs, list(is_sequential = FALSE))))


why does it call data.table(...)? And does it have any chance of getting into conflict with any of the special args of data.table() (e.g. key, keep.rownames)

mirai::mirai_map() only accepts data.frame and matrix when you want to map over multiple input. data.table handles list columns better. ... contains the iteration and learner when called by resample() and additionally task and resampling when called by benchmark().

And does it have any chance of getting into conflict with any of the special args

I think no. future_map is only called internally.

inst/testthat/helper_misc.R

sebffischer · 2025-05-22T12:23:06Z

R/helper_exec.R

@@ -23,6 +23,9 @@ future_map = function(n, FUN, ..., MoreArgs = list()) {
  if (getOption("mlr3.debug", FALSE)) {
    lg$info("Running experiments sequentially in debug mode with %i iterations", n)
    mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+  } else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {


This only works with the default compute profile

Yes, I didn't want to make it too complicated. What would be the best way to pass the compute profile? We also have to pass it to mirai_map() .

I suggest we expose a .compute argument to benchmark() and resample().

I'm not so happy with that because we are introducing parameters for a special backend. Moreover, this argument would then have to be added to all tuning and feature selection functions.

I wouldn't worry about this. Already in a with(daemons(...), { }) scope mirai() calls will default to those settings with .compute = NULL. More to come on this front.

sebffischer · 2025-05-22T13:23:29Z

Most importantly: we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

One way to achiev this is to use L-Ecuyer-CMRG and generate seeds using:

RNGkind("L'Ecuyer-CMRG")
set.seed(1)
s <- .Random.seed
for (i in 1:10) {
  s <- nextRNGStream(s)
  # send s to worker i as .Random.seed
}

This is taken from the parallel documentation.

be-marc · 2025-05-22T14:23:38Z

we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

Yes, for now I pointed out in the documentation that it does not deliver the same results as future.

One way to achieve this is to use L-Ecuyer-CMRG and generate seeds using

Yes I did the same in rush. Not sure why Charlie is doing it differently.

sebffischer · 2025-05-23T10:25:42Z

we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

Yes, for now I pointed out in the documentation that it does not deliver the same results as future.

One way to achieve this is to use L-Ecuyer-CMRG and generate seeds using

Yes I did the same in rush. Not sure why Charlie is doing it differently.

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

But this causes e.g. the following behavior:

library(mirai)

set.seed(1)
daemons(2)
#> [1] 2
x1 = mirai_map(1, function(i) rnorm(1))[][[1]]
set.seed(1)
x2 = mirai_map(1, function(i) rnorm(1))[][[1]]
x1 == x2
#> [1] FALSE

^{Created on 2025-05-23 with reprex v2.1.1}

I think it would be more user friendly for us to also set the seeds again before starting the parallel work.

R/helper_exec.R

sebffischer · 2025-05-23T11:34:39Z

Another idea for optimization:

If we had more control over the daemons, e.g. via a special compute profile .mlr3, we could also globally load some packages on the workers (e.g. mlr3 itself).
Not sure whether results in a noticeable runtime improvement but we could explore it.

We could offer both options:
a) The user uses a existing compute profile`
b) We spawn daemons, which is where we can add such optimizations.

be-marc · 2025-05-23T12:16:54Z

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

Does this mean that the substreams are not attached to the tasks but to the daemons? It somehow sounds like that in https://mirai.r-lib.org/reference/daemons.html#arg-seed. Otherwise, the order in which tasks are sent would have no influence.

be-marc · 2025-05-23T12:17:46Z

First draft for the book section https://github.com/mlr-org/mlr3book/pull/880/files#diff-e7d626b13fe29f0532326141ae93476419df13a6f5850becfb1359d4bc86a611

sebffischer · 2025-05-23T12:41:58Z

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

Does this mean that the substreams are not attached to the tasks but to the daemons? It somehow sounds like that in https://mirai.r-lib.org/reference/daemons.html#arg-seed. Otherwise, the order in which tasks are sent would have no influence.

Yes, I think so

be-marc added 12 commits December 3, 2024 13:16

...

c6f410a

...

e79da73

...

a9fa258

...

e39c244

Merge branch 'main' into mirai

b98ad9b

...

9ae87a4

...

67e6031

...

c050efa

...

1cdcfc7

...

597fa9b

...

63709a6

...

6c45a15

be-marc requested a review from Copilot May 22, 2025 10:56

Copilot AI reviewed May 22, 2025

View reviewed changes

be-marc added 9 commits May 22, 2025 13:01

...

3d90df8

...

8ef3cbb

...

f511627

...

1198a7d

...

8437803

...

e28bc38

...

2ee1b76

...

2e60779

...

5f08eaa

mb706 reviewed May 22, 2025

View reviewed changes

inst/testthat/helper_misc.R Show resolved Hide resolved

mb706 reviewed May 22, 2025

View reviewed changes

sebffischer reviewed May 22, 2025

View reviewed changes

sebffischer mentioned this pull request May 23, 2025

Feature Request: Helper function to set the seeds on all daemons r-lib/mirai#290

Closed

be-marc commented May 23, 2025

View reviewed changes

R/helper_exec.R Outdated Show resolved Hide resolved

...

db0af10

chore: remotes

867d128

be-marc mentioned this pull request Jun 4, 2025

rush adapter for mlr3 benchmarks? mlr-org/rush#45

Open

be-marc added 2 commits June 11, 2025 15:14

...

f98f99a

...

ad8df18

Uh oh!

feat: add mirai parallelization #1314

Are you sure you want to change the base?

feat: add mirai parallelization #1314

Uh oh!

Conversation

be-marc commented May 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

mb706 May 22, 2025

Choose a reason for hiding this comment

Uh oh!

be-marc May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mb706 May 22, 2025

Choose a reason for hiding this comment

Uh oh!

be-marc May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sebffischer May 22, 2025

Choose a reason for hiding this comment

Uh oh!

be-marc May 22, 2025

Choose a reason for hiding this comment

Uh oh!

sebffischer May 23, 2025

Choose a reason for hiding this comment

Uh oh!

be-marc May 23, 2025

Choose a reason for hiding this comment

Uh oh!

shikokuchuo Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

sebffischer commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

be-marc commented May 22, 2025

Uh oh!

sebffischer commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sebffischer commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

be-marc commented May 23, 2025

Uh oh!

be-marc commented May 23, 2025

Uh oh!

sebffischer commented May 23, 2025

Uh oh!

Uh oh!

be-marc May 22, 2025 •

edited

Loading

be-marc May 22, 2025 •

edited

Loading

sebffischer commented May 22, 2025 •

edited

Loading

sebffischer commented May 23, 2025 •

edited

Loading

sebffischer commented May 23, 2025 •

edited

Loading