Commit c942a6b
[client] Bound read timeout on streaming API requests to detect dead connections
`sky status`/`sky logs`/`sky jobs logs` stream their results from the API
server's `/api/stream` (and `/logs`, `/provision_logs`, `/jobs/logs`) endpoints.
The client made these streaming reads with no read timeout
(`timeout=(connect, None)`), so when a streaming connection silently dies -- a
proxy/CDN drops the long-lived connection, or the peer goes away mid-stream --
the client blocks in recv() forever. We observed `sky status` wedged ~20 min in
a single call (kernel stack stuck in `tcp_recvmsg`), while a fresh `sky status`
returned instantly, confirming a dead connection rather than a slow server.
The server already sends a heartbeat every 30s on these streams
(`stream_utils.py:_HEARTBEAT_INTERVAL`, #5750) to keep them busy through
idle-timeout proxies. This adds the missing client-side counterpart: a read
timeout of 4x the heartbeat (120s). A healthy stream is reset by each 30s
heartbeat so the timeout never fires; a dead stream raises `ReadTimeout`, which
these readers already retry (`retry_transient_errors`) on a fresh connection.
Scope: the retry-wrapped sync console-log streams -- `tail_logs`,
`tail_provision_logs`, `stream_and_get` (sky/client/sdk.py) and `jobs.tail_logs`
(sky/jobs/client/sdk.py). Left for follow-up (each needs its own retry wrapper
first, else a timeout would surface instead of retry): `serve.tail_logs`, the
async client streams, and `jobs.download_logs_streaming`. Heartbeat-less
endpoints (`/api/get` long-poll, `format=plain` log streams) keep an unbounded
read, since they may legitimately block with no intermediate output.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 7ee947c commit c942a6b
4 files changed
Lines changed: 155 additions & 642 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
66 | 78 | | |
67 | 79 | | |
68 | 80 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1165 | 1165 | | |
1166 | 1166 | | |
1167 | 1167 | | |
1168 | | - | |
| 1168 | + | |
1169 | 1169 | | |
1170 | 1170 | | |
1171 | 1171 | | |
| |||
1225 | 1225 | | |
1226 | 1226 | | |
1227 | 1227 | | |
1228 | | - | |
| 1228 | + | |
1229 | 1229 | | |
1230 | 1230 | | |
1231 | 1231 | | |
| |||
2442 | 2442 | | |
2443 | 2443 | | |
2444 | 2444 | | |
2445 | | - | |
| 2445 | + | |
2446 | 2446 | | |
2447 | 2447 | | |
2448 | 2448 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
543 | 543 | | |
544 | 544 | | |
545 | 545 | | |
546 | | - | |
| 546 | + | |
| 547 | + | |
547 | 548 | | |
548 | 549 | | |
549 | 550 | | |
| |||
0 commit comments