Replies: 2 comments
-
Thanks for this. This is an important topic. It's a discussion that is still open and applies to all parallelization frameworks - not just futureverse.
The All other backends runs parallel workers in standalone R processes running in the background, either on the local machine or on external machines. Those backends are spun up only inheriting so much from the parent process, which is because they may run on hosts completely unrelated to the machine where the main R session runs. When we know we are running on the same machine ("the local host"), we can carry forward some more settings. Specifically, localhost parallel workers are configured to use the same R package library (
For similar reason, I don't think it is safe for a parallel worker to inherit settings for multi-threading process from the main R session. One might be able to argue it could be done for localhost parallelization (e.g. While writing this, other than #388, which you've found, and futureverse/parallelly#47, I realize I don't have a public issue on the idea of forcing single-threaded(*) processing by default in parallel workers. With the major internal redesign in future 1.40.0, the proposal to support something like: plan(multisession, workers = 3, threads_per_worker = 4) should not be that much work anymore, and maybe the default could be (*) single or dual, depending on CPU architecture. I agree that we should strive for a cross-platform solution, and I ideally this can be encapsulated internally by the futureverse. I should also clarify that I've punted on this topic and design decision for quite a while. This is simply, because I haven't had the bandwidth, but also due to limitations in cross-platform solutions. I'm also hoping there's some precedence on this out there in R or elsewhere, but we just haven't stumbled up on it yet. Please feel free to continue this discussion and add thoughts and comments here, because it is important, and it would be nice to settle on best practices around this. |
Beta Was this translation helpful? Give feedback.
-
Hi! Thanks a lot for your detailed response, it shed light on many things! I'll try to help as much as I can, but I'm not too savvy on the subject.
I believe you're right (but I have other suggestions at the end). Detection of number of physical cores and logical threadsI used
I checked the source code of For Mac OS, it seems to use the canonical command. source code detectCores (R 4.5.0)detectCores <-
if(.Platform$OS.type == "windows") {
function(all.tests = FALSE, logical = TRUE) {
## result is # cores, logical processors.
res <- .Call(C_ncpus, FALSE)
res[if(logical) 2L else 1L]
}
} else {
function(all.tests = FALSE, logical = TRUE) {
## Commoner OSes first
## for Linux systems, physical id is 1 for second hyperthread
## Irix support removed in R 4.1.0
systems <-
## quoting needed for a Bourne shell
list(linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l',
## hw.physicalcpu is not documented for 10.9, but works
darwin = if(logical) "/usr/sbin/sysctl -n hw.logicalcpu 2>/dev/null" else "/usr/sbin/sysctl -n hw.physicalcpu 2>/dev/null",
solaris = if(logical) "/usr/sbin/psrinfo -v | grep 'Status of.*processor' | wc -l" else "/bin/kstat -p -m cpu_info | grep :core_id | cut -f2 | uniq | wc -l",
freebsd = "/sbin/sysctl -n hw.ncpu 2>/dev/null",
openbsd = "/sbin/sysctl -n hw.ncpuonline 2>/dev/null")
nm <- names(systems)
m <- pmatch(nm, R.version$os); m <- nm[!is.na(m)]
if (length(m)) {
cmd <- systems[[m]]
if(!is.null(a <- tryCatch(suppressWarnings(system(cmd, TRUE)),
error = function(e) NULL))) {
a <- gsub("^ +","", a[1])
if (grepl("^[1-9]", a)) return(as.integer(a))
}
}
if (all.tests) {
for (i in seq(systems))
for (cmd in systems[i]) { # Irix had two commands
if(is.null(a <- tryCatch(suppressWarnings(system(cmd, TRUE)),
error = function(e) NULL)))
next
a <- gsub("^ +","", a[1])
if (grepl("^[1-9]", a)) return(as.integer(a))
}
}
NA_integer_
}
} The simplest alternative on Linux would be to use lscpu | grep -i ^socket # sockets
lscpu | grep -i ^core # cores (per socket)
lscpu | grep -i ^thread # threads per core But I don't know if grep -i 'physical id' /proc/cpuinfo | sort -u | wc -l # sockets
grep -i 'cpu core' /proc/cpuinfo | sort -u | sed 's/[^:]*:[ ]*//' # cores (might not be sufficient, see *)
grep -i 'processor' /proc/cpuinfo | sort -u | wc -l # threads (total)
# OR
grep -ci 'cpu core' /proc/cpuinfo * special case (?): multiple sockets with different number of cores on a same machine. Is it possible? If yes, is it possible that some sockets carry dual cores and other single cores on a same machine? ThoughtsBut I don't know to what extent this distinction between cores and logical threads (as specified by the OS) is relevant on Linux machines. I can perfectly setup a plan with And regardless of the OS, I'm not sure how relevant it is to link the putative My feeling is that the default values could be:
# assume 40 threads
# not nested
plan(
tweak(multisession, workers = 10, threads_per_worker = 40 / 10) # = 4 per worker -> 4 * 10 = 40
)
# nested
plan(
tweak(multisession, workers = 10, threads_per_worker = 1), # single-threaded -> 30 threads available
tweak(multisession, workers = 2, threads_per_worker = 40 / (10 * 1)) # = 4 per worker
) But I don't know if 1) it is possible to make a future aware that it is calling another future and 2) if the future of the first multisession are using CPU resources while the nested futures are being resolved. Not necessarily right ? some futures are non-blocking
I'll stop here because this is long-winded enough and I feel like I'm starting to go off on a tangent (I hope not). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
This doesn't seem to be a bug, I just wanted to report this behavior. Here is a reproducible example (with BLAS, but reproducible with OpenMP):
So unless I am missing something, the only way to preserve the value of
RhpcBLASctl::blas_get_num_procs()
in amultisession
future is to set it inside the future, whereas it is preserved inside amulticore
future. (as a side note,callr
behaves as amultisession
)Why is there such a difference between the back-ends? Would it be possible to harmonize the behaviors?
I think it would be preferable to preserve the number of threads in both cases, because of nested futures:
Now, let's imagine that
foo()
is a function from a package parallelized withfuture_apply
, with each apply loop calling a function that is multi-threaded with BLAS. A Windows or RStudio user could not properly manage the number of threads when calling this function inside a future (to runfoo()
in the background for example) . He would have to replace the nestedmultisession
by asequential
plan to avoid the excessive number of threads.I've seen this old comment I think it would be nice to have as a developer.
Sorry for the verbosity, I hope it was clear. Any help, feedback or comment would be appreciated.
Thanks !
Best
sessionInfo()
Beta Was this translation helpful? Give feedback.
All reactions