Remove global read_timeout default, add watcher-level idle timeout by doxxx93 · Pull Request #1945 · kube-rs/kube

doxxx93 · 2026-02-23T23:49:06Z

Motivation

The default config.read_timeout (295s) is applied at the hyper-timeout connector level, which enforces an idle timeout on all TCP I/O indiscriminately. This breaks long-lived connections like exec, attach, and port-forward — if there's no stdin/stdout activity for 295s, the connection is killed with a broken pipe.

The Go client has no global read timeout. Watch streams rely on the server-side timeoutSeconds parameter instead.

Verified on a kind cluster (v1.35.0): exec connections die after idle with read_timeout set, but survive indefinitely with None.

Solution

Default read_timeout to None in all Config constructors, matching the Go client behavior.
Add a watcher-level idle timeout (next_with_idle_timeout) that wraps stream.next() with tokio::time::timeout. The timeout is set to the server-side timeoutSeconds + 5s margin, so watches still recover from dead connections where the server's close never arrives.

This way each timeout is owned by the layer that actually needs it:

exec/attach/portforward: no timeout (works indefinitely)
watch streams: watcher manages its own idle timeout (reconnects on network failure)

Signed-off-by: doxxx93 <doxxx93@gmail.com>

codecov · 2026-02-24T00:41:05Z

Codecov Report

❌ Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.5%. Comparing base (cfa38f2) to head (872c203).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
kube-runtime/src/watcher.rs	74.0%	6 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #1945     +/-   ##
=======================================
- Coverage   76.5%   76.5%   -0.0%     
=======================================
  Files         89      89             
  Lines       8566    8587     +21     
=======================================
+ Hits        6547    6563     +16     
- Misses      2019    2024      +5

Files with missing lines	Coverage Δ
kube-client/src/config/mod.rs	`54.7% <ø> (ø)`
kube-runtime/src/watcher.rs	`58.3% <74.0%> (+1.9%)`	⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: doxxx93 <doxxx93@gmail.com>

doxxx93 · 2026-03-19T09:59:15Z

next_with_idle_timeout had no tests, so patch coverage was at 33%. Added 3 tests using #[tokio::test(start_paused = true)] — tokio auto-advances virtual time when the runtime is idle, so the 295s timeout completes instantly.

returns_item_when_stream_has_data — data available, returns immediately
returns_none_on_dead_connection — stream hangs forever, timeout fires → None (triggers reconnect)
returns_none_when_stream_ends — stream closes normally → None

clux

Thanks for this. Sorry took me a while to get to it.
Left a few comments on some documentation choices, but otherwise the wrapping function here is a very nice solution.

Signed-off-by: doxxx93 <doxxx93@gmail.com>

Closes #292. The shared kube-rs client set `read_timeout = 30s`, which kube-client forwards to hyper-timeout's per-read timer on the underlying I/O. Any long-lived idle stream (`pods.log_stream`, `pods.exec`) was therefore torn down after ~30s of silence and surfaced in the UI as "ServiceError: error reading a body from connection". Leave `read_timeout` and `write_timeout` unset on the shared client so streams stay open until the user, the pod, or the API server closes them. This matches the new kube-rs upstream default (PR kube-rs/kube#1945) and the behavior of kubectl / the Go client. `connect_timeout = 10s` is preserved so bad clusters still fail fast at connection time. Per-request deadlines for non-streaming calls already live at the call site via `tokio::time::timeout` (see `test_connection`). Apply the same policy in both client paths: - `init_with_context`, used by the Tauri app via `commands/clusters.rs` - `init`, used by the MCP server via `mcp/server.rs::connect_to_cluster` A previously-stale `#[allow(dead_code)]` on `init` was removed; the function is in fact wired up through MCP. Adds a regression test (`shared_client_has_no_read_timeout`) that pins the timeout policy so a future change cannot silently reintroduce the bug. Follow-up tracked in #296: replace the raw `ServiceError` banner with an inline "Stream ended" notice plus a Reconnect button, since the same error surface still fires for legitimate stream ends (pod terminated, kubelet idle timeout, LB cuts, etc.).

fix: remove default read timeout to support long-lived connections

ba0a8e6

Signed-off-by: doxxx93 <doxxx93@gmail.com>

doxxx93 requested a review from clux February 24, 2026 00:17

doxxx93 added blocked awaiting upstream work client kube Client related runtime controller runtime related and removed blocked awaiting upstream work labels Feb 24, 2026

ian-stclab mentioned this pull request Feb 26, 2026

docs: add SSA patterns, error handling, and troubleshooting enhancements kube-rs/website#88

Draft

doxxx93 added 2 commits March 19, 2026 18:38

test(watcher): add tests for idle timeout behavior with streams

1e95801

Signed-off-by: doxxx93 <doxxx93@gmail.com>

Merge branch 'main' into fix/read-timeout-per-layer

7d74703

clux added this to the 4.0.0 milestone Mar 27, 2026

clux added the changelog-change changelog change category for prs label Mar 27, 2026

clux added 2 commits March 27, 2026 11:59

Merge branch 'main' into fix/read-timeout-per-layer

d9beb9b

Merge branch 'main' into fix/read-timeout-per-layer

1d0e5a4