Skip to content

Issues with foreach and SGE #686

@fkgruber

Description

@fkgruber

furrr works perfectly with future.batchtools. If you have a loop with 3 elements you get 3 jobs on the cluster:

library(furrr)
library(future.batchtools)
plan(batchtools_sge)
nothingness <- future_map(c(2, 2, 2), ~Sys.sleep(.x))
qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  14375 0.50500 jobb856749 fred         r     06/03/2023 22:23:03 [email protected]     1        
  14376 0.50500 job2b8fc0d fred         r     06/03/2023 22:23:03 [email protected]     1        
  14377 0.50500 jobfdf28f1 fred         r     06/03/2023 22:23:03 [email protected]     1        

With foreach, however, I only get 1 job:

library(foreach)
library(future)
library(furrr)
library(future.batchtools)
library(doFuture)

mu <- 1.0
sigma <- 2.0
registerDoFuture()
plan(batchtools_sge)
x %<-% {
  foreach(i = 1:3) %dopar% {
    Sys.sleep(3)
    set.seed(123)
    rnorm(i, mean = mu, sd = sigma)
  }
}
qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  14378 0.50500 job7472e4f fred         r     06/03/2023 22:26:03 [email protected]     1        

and when it return we get the following 2 strange warnings:

Warning messages: 1: executing %dopar% sequentially: no parallel backend registered 2: UNRELIABLE VALUE: Future ( ) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". >

Why does it say there is no parallel backend registered when I'm running registerDoFuture()?

Alternatively, I tried %dofuture% instead of %dopar% but it still only generates 1 job.

x %<-% {
  foreach(i = 1:10) %dofuture% {
						   Sys.sleep(3)
						   set.seed(123)
						   rnorm(i, mean = mu, sd = sigma)
						 }
  }

f = futureOf(x)
resolved(f)
x
qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  14379 0.50500 job4b3cd22 fred         r     06/03/2023 22:28:48 all.q@ipxxx     1        

This time I only get the random number warning:

Warning message: UNRELIABLE VALUE: At least one of iterations 1-10 of the foreach() %dofuture% { … }, part of chunk #1 ( doFuture2-1 ), unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify foreach() argument '.options.future = list(seed = TRUE)'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, set option 'doFuture.rng.onMisuse' to "ignore". >

I also tried other options in foreach like .options.future = list(scheduling=1) but they don't seem to have any effect.

Is it possible that foreach somehow chunks all the iteration in one task? Or is something else not working.

Thanks
Fred

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions