In a multisession setting, can the connection between main session and a worker session break? #738
-
In multisession mode, processes communicate via sockets using the loopback interface. Can the connection be broken between sessions, such that a future is unable to resolve, but the worker session stays running but incapable of receiving new tasks? If the answer is yes:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Yes,
The socket connection between the main R session and a parallel worker itself should be stable as long as the network connection between the two does not go down. Since However, it is not just the connection that may fail. The communication protocol used by parallel's PSOCK clusters can become corrupt if either the main R session or the parallel worker gets interrupted in the middle of a communication. So, if the user hits Ctrl-C while there is ongoing communication, then that cluster node and the corresponding future can become corrupt and non-functional. Another reason is when the parallel worker process terminates or crashes for other reasons, e.g. running unstable code that core dumps R, running out of memory, "some watchdog" process decides to shut down the parallel worker, because it misbehaves (too much memory, too long run-time, ...)
You can simulate a terminated worker as: library(future)
plan(multisession)
f <- future( { tools::pskill(Sys.getpid()) })
v <- value(f) This will throw an error, and in the next release, this error is more specific:
And in the next release, the crashed workers will be automatically restarted, and you'll be able to call |
Beta Was this translation helpful? Give feedback.
-
UPDATE: Since future 1.40.0 (2025-04-10), the framework now detects crashed workers and restarts them automatically in the background: library(future)
plan(multisession, workers = 2)
message(sprintf("Number of free workers: %d/%d", nbrOfFreeWorkers(), nbrOfWorkers()))
#> Number of free workers: 2/2
f <- future( { tools::pskill(Sys.getpid()) })
message(sprintf("Number of free workers: %d/%d", nbrOfFreeWorkers(), nbrOfWorkers()))
#> Number of free workers: 1/2
v <- value(f)
#> Error: Future (NULL) of class MultisessionFuture interrupted, while running on 'localhost' (pid 252921)
message(sprintf("Number of free workers: %d/%d", nbrOfFreeWorkers(), nbrOfWorkers()))
#> Number of free workers: 2/2 We can also reset the crashed future; f <- reset(f) Of course, as the future expression is written in this specific example, attempts to rerun it (e.g. |
Beta Was this translation helpful? Give feedback.
Yes,
multisession
andcluster
futures rely on PSOCK cluster of the parallel package, and as you say, rely on socket connections between the main R processes and the parallel workers.The socket connection between the main R session and a parallel worker itself should be stable as long as the network connection between the two does not go down. Since
multisession
runs parallel workers on the local machine, it should be unlikely that such a connection becomes br…