-
Couldn't load subscription status.
- Fork 0
Description
Idea: Create a shared memory area for each worker (shared with the coordination process). Before each execution, the worker stores the current binary into that shared memory area. The coordination tasks periodically check the liveness of each worker. When a configurable amount of time has passed without a status update, the worker is killed, the binary stored in the respective shared memory area stored as timeout and the worker is restarted.
TODO: Find a way to detect a memory exhaustion and handle it in a similar way (assuming the process gets killed when memory is exhausted). E.g. analyzing the return code for fatal error signals might be a solution.