Allow configuring the cluster worker ping timeout by odahcam · Pull Request #23 · amphp/cluster

odahcam · 2026-05-07T18:14:54Z

Problem

ContextClusterWorker (the watcher's per-worker bookkeeping object) hard-codes a 10-second ping timeout:

// src/Internal/ContextClusterWorker.php
final class ContextClusterWorker extends AbstractLogger implements ClusterWorker
{
    private const PING_TIMEOUT = 10;
    // ...
    public function run(): void
    {
        \$watcher = EventLoop::repeat(self::PING_TIMEOUT / 2, weakClosure(function (): void {
            if (\$this->lastActivity < \time() - self::PING_TIMEOUT) {
                \$this->close();
                return;
            }
            // ...
        }));
    }
}

This works well when worker handlers are non-blocking. In real-world PHP applications, however, parts of the request lifecycle are synchronous and blocking:

PDO drivers (`pdo_mysql`, `pdo_pgsql`, …) — they block the event loop until the database responds.
`file_get_contents()` against remote URLs.
Synchronous Redis clients (e.g. Predis without the async loop adapter).
CPU-bound work (image processing, PDF rendering).

While such code is blocking, the worker process cannot service the ping, so its `lastActivity` is not refreshed. Within ~10s the watcher declares the worker dead and closes its context. From the operator's point of view, the symptom is `Worker N died unexpectedly: The context stopped responding` even though the worker was making progress on a single (slow but legitimate) request.

This forces applications that have legitimate >10s blocking work to either (a) avoid `amphp/cluster` entirely, (b) split that work into async/queued jobs (a substantial refactor), or (c) vendor-patch `PING_TIMEOUT`. Option (c) is what real users end up doing.

Proposed change

Make the ping timeout a configurable parameter on `ClusterWatcher`, threaded through to the internal `ContextClusterWorker`. Default value stays `10` (no behaviour change for existing users).

Public API

```php
$watcher = new ClusterWatcher(
script: DIR . '/server.php',
logger: $logger,
workerPingTimeout: 45, // accommodate legitimate long-blocking work
);
```

Internal change

The hard-coded `private const PING_TIMEOUT` becomes `public const DEFAULT_PING_TIMEOUT` so it remains the single source of truth and is referenceable from `ClusterWatcher`'s constructor signature. `ContextClusterWorker` accepts an optional `int $pingTimeout` constructor parameter and uses it in `EventLoop::repeat()` and the activity comparison.

Validation

`workerPingTimeout < 1` throws `\ValueError` from `ClusterWatcher`'s constructor.

Backwards compatibility

Default value is `10`, matching the previous hard-coded constant.
New parameter is optional; existing call sites continue to work without change.
The renamed constant (`PING_TIMEOUT` → `DEFAULT_PING_TIMEOUT`) was `private`, so no public API depended on its name.

Test plan

Existing tests continue to pass (defaults are unchanged).
Manual: run a worker that `sleep(20)`s with `workerPingTimeout=10` (current behaviour: dies after ~10s) and `workerPingTimeout=30` (does not die).

I'm happy to add a unit test for the constructor validation if you'd like — wanted to keep the diff minimal for first review.

Open questions for the maintainer

Should the parameter be on `ClusterWatcher`'s constructor (proposed), or on a builder/factory? The constructor is consistent with how `IpcHub` etc. are passed today.
Naming: `workerPingTimeout` vs. `pingTimeout` vs. `pingTimeoutSeconds`. Open to bikeshedding.
Should the value be `int` (seconds) or `float` (sub-second granularity)? `EventLoop::repeat()` accepts a float; the rest of the public API uses `int` for time values.
CLI: `vendor/bin/cluster` could grow a `--worker-ping-timeout=` flag — happy to do that in a follow-up if you'd take it here.

Why we are filing this

We run a cluster of ReactPHP HTTP workers that share long-lived state through Doctrine ORM (synchronous PDO). Specific reporting endpoints have a legitimate execution time that exceeds 10s; today the watcher treats them as dead workers and recycles them mid-response. Bumping `PING_TIMEOUT` is the smallest correct change. We're happy to iterate on the design if any of the choices above don't fit.

The worker liveness ping timeout was a hard-coded 10s `private const` in `ContextClusterWorker`. Applications that legitimately do synchronous blocking work longer than 10s (e.g. PDO drivers, sync Redis clients, remote `file_get_contents`) cannot service pings during the blocked window, so the watcher terminates them mid-request even though the worker is making progress. This change exposes the timeout as an optional `$workerPingTimeout` constructor parameter on `ClusterWatcher`, threaded through to `ContextClusterWorker`. The default value is `10`, preserving existing behaviour for all call sites that don't pass the new parameter. The constant is renamed from the private `PING_TIMEOUT` to public `DEFAULT_PING_TIMEOUT` so it remains the single source of truth for the default and is referenceable from `ClusterWatcher`'s constructor signature. Validation: `$workerPingTimeout < 1` throws `\ValueError`. Sample usage: $watcher = new ClusterWatcher( script: __DIR__ . '/server.php', logger: $logger, workerPingTimeout: 45, );

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow configuring the cluster worker ping timeout#23

Allow configuring the cluster worker ping timeout#23
odahcam wants to merge 1 commit intoamphp:2.xfrom
odahcam:feature/configurable-ping-timeout

odahcam commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

odahcam commented May 7, 2026

Problem

Proposed change

Public API

Internal change

Validation

Backwards compatibility

Test plan

Open questions for the maintainer

Why we are filing this

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant