Skip to content

Commit 0c49266

Browse files
authored
[zephyr] Raise worker idle poll backoff cap from 1.0s to 5.0s (#5051)
`_poll_loop` backed off up to 1.0s between pull_task calls when no task was available. Each task now runs in a fresh subprocess taking roughly 1s, so re-polling every second caused busy-waiting between subprocess launches. The cap was set before subprocess-per-shard isolation landed in #4522 and was never revisited; 5.0s matches the typical subprocess task duration. Each pull_task RPC that returns None still has to go through the full coordinator path: RPC deserialization, lock acquisition, dict lookups, lock release, serialization. With 64 idle workers polling every 1.0s you get 64 wasted RPCs/second. At 5.0s cap that drops to ~13/second. The coordinator is also getting ~13 heartbeat RPCs/second from those same 64 workers (one per worker per 5s heartbeat interval), so the idle polling at 1.0s was actually more traffic than the heartbeats themselves. Raising the cap brings the two closer to the same rate. Whether this is perceptible depends on worker count. With 16 workers it's noise either way. With 128+ idle workers in a straggler tail it could show up as a few percent of coordinator CPU. The coordinator is provisioned small (2g RAM, 1 CPU by default from 6c0b22c) so any reduction in unnecessary RPC handling there is genuinely useful.
1 parent ff9b2a9 commit 0c49266

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

lib/zephyr/src/zephyr/execution.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1122,7 +1122,7 @@ def _heartbeat_loop(
11221122
def _poll_loop(self, coordinator: ActorHandle) -> None:
11231123
"""Pure polling loop. Exits on SHUTDOWN signal, coordinator death, or shutdown event."""
11241124
task_count = 0
1125-
backoff = ExponentialBackoff(initial=0.1, maximum=1.0)
1125+
backoff = ExponentialBackoff(initial=0.1, maximum=5.0)
11261126

11271127
future: ActorFuture | None = None
11281128
future_start = 0.0

0 commit comments

Comments
 (0)