Allow configuring the cluster worker ping timeout#23
Open
Allow configuring the cluster worker ping timeout#23
Conversation
The worker liveness ping timeout was a hard-coded 10s `private const` in
`ContextClusterWorker`. Applications that legitimately do synchronous
blocking work longer than 10s (e.g. PDO drivers, sync Redis clients,
remote `file_get_contents`) cannot service pings during the blocked
window, so the watcher terminates them mid-request even though the
worker is making progress.
This change exposes the timeout as an optional `$workerPingTimeout`
constructor parameter on `ClusterWatcher`, threaded through to
`ContextClusterWorker`. The default value is `10`, preserving existing
behaviour for all call sites that don't pass the new parameter.
The constant is renamed from the private `PING_TIMEOUT` to public
`DEFAULT_PING_TIMEOUT` so it remains the single source of truth for the
default and is referenceable from `ClusterWatcher`'s constructor
signature.
Validation: `$workerPingTimeout < 1` throws `\ValueError`.
Sample usage:
$watcher = new ClusterWatcher(
script: __DIR__ . '/server.php',
logger: $logger,
workerPingTimeout: 45,
);
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ContextClusterWorker(the watcher's per-worker bookkeeping object) hard-codes a 10-second ping timeout:This works well when worker handlers are non-blocking. In real-world PHP applications, however, parts of the request lifecycle are synchronous and blocking:
While such code is blocking, the worker process cannot service the ping, so its `lastActivity` is not refreshed. Within ~10s the watcher declares the worker dead and closes its context. From the operator's point of view, the symptom is `Worker N died unexpectedly: The context stopped responding` even though the worker was making progress on a single (slow but legitimate) request.
This forces applications that have legitimate >10s blocking work to either (a) avoid `amphp/cluster` entirely, (b) split that work into async/queued jobs (a substantial refactor), or (c) vendor-patch `PING_TIMEOUT`. Option (c) is what real users end up doing.
Proposed change
Make the ping timeout a configurable parameter on `ClusterWatcher`, threaded through to the internal `ContextClusterWorker`. Default value stays `10` (no behaviour change for existing users).
Public API
```php
$watcher = new ClusterWatcher(
script: DIR . '/server.php',
logger: $logger,
workerPingTimeout: 45, // accommodate legitimate long-blocking work
);
```
Internal change
The hard-coded `private const PING_TIMEOUT` becomes `public const DEFAULT_PING_TIMEOUT` so it remains the single source of truth and is referenceable from `ClusterWatcher`'s constructor signature. `ContextClusterWorker` accepts an optional `int $pingTimeout` constructor parameter and uses it in `EventLoop::repeat()` and the activity comparison.
Validation
`workerPingTimeout < 1` throws `\ValueError` from `ClusterWatcher`'s constructor.
Backwards compatibility
Test plan
I'm happy to add a unit test for the constructor validation if you'd like — wanted to keep the diff minimal for first review.
Open questions for the maintainer
Why we are filing this
We run a cluster of ReactPHP HTTP workers that share long-lived state through Doctrine ORM (synchronous PDO). Specific reporting endpoints have a legitimate execution time that exceeds 10s; today the watcher treats them as dead workers and recycles them mid-response. Bumping `PING_TIMEOUT` is the smallest correct change. We're happy to iterate on the design if any of the choices above don't fit.