Skip to content

Remove global read_timeout default, add watcher-level idle timeout#1945

Merged
clux merged 6 commits into
kube-rs:mainfrom
doxxx93:fix/read-timeout-per-layer
Mar 27, 2026
Merged

Remove global read_timeout default, add watcher-level idle timeout#1945
clux merged 6 commits into
kube-rs:mainfrom
doxxx93:fix/read-timeout-per-layer

Conversation

@doxxx93

@doxxx93 doxxx93 commented Feb 23, 2026

Copy link
Copy Markdown
Member

Fixes #1798

Motivation

The default config.read_timeout (295s) is applied at the hyper-timeout connector level, which enforces an idle timeout on all TCP I/O indiscriminately. This breaks long-lived connections like exec, attach, and port-forward — if there's no stdin/stdout activity for 295s, the connection is killed with a broken pipe.

The Go client has no global read timeout. Watch streams rely on the server-side timeoutSeconds parameter instead.

Verified on a kind cluster (v1.35.0): exec connections die after idle with read_timeout set, but survive indefinitely with None.

Solution

  1. Default read_timeout to None in all Config constructors, matching the Go client behavior.

  2. Add a watcher-level idle timeout (next_with_idle_timeout) that wraps stream.next() with tokio::time::timeout. The timeout is set to the server-side timeoutSeconds + 5s margin, so watches still recover from dead connections where the server's close never arrives.

This way each timeout is owned by the layer that actually needs it:

  • exec/attach/portforward: no timeout (works indefinitely)
  • watch streams: watcher manages its own idle timeout (reconnects on network failure)

@doxxx93 doxxx93 requested a review from clux February 24, 2026 00:17
@doxxx93 doxxx93 added blocked awaiting upstream work client kube Client related runtime controller runtime related and removed blocked awaiting upstream work labels Feb 24, 2026
@codecov

codecov Bot commented Feb 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.5%. Comparing base (cfa38f2) to head (872c203).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kube-runtime/src/watcher.rs 74.0% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1945     +/-   ##
=======================================
- Coverage   76.5%   76.5%   -0.0%     
=======================================
  Files         89      89             
  Lines       8566    8587     +21     
=======================================
+ Hits        6547    6563     +16     
- Misses      2019    2024      +5     
Files with missing lines Coverage Δ
kube-client/src/config/mod.rs 54.7% <ø> (ø)
kube-runtime/src/watcher.rs 58.3% <74.0%> (+1.9%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@doxxx93

doxxx93 commented Mar 19, 2026

Copy link
Copy Markdown
Member Author

next_with_idle_timeout had no tests, so patch coverage was at 33%. Added 3 tests using #[tokio::test(start_paused = true)] — tokio auto-advances virtual time when the runtime is idle, so the 295s timeout completes instantly.

  • returns_item_when_stream_has_data — data available, returns immediately
  • returns_none_on_dead_connection — stream hangs forever, timeout fires → None (triggers reconnect)
  • returns_none_when_stream_ends — stream closes normally → None

@clux clux added this to the 4.0.0 milestone Mar 27, 2026
@clux clux added the changelog-change changelog change category for prs label Mar 27, 2026
Comment thread kube-runtime/src/watcher.rs Outdated
Comment thread kube-runtime/src/watcher.rs Outdated
Comment thread kube-runtime/src/watcher.rs
Comment thread kube-runtime/src/watcher.rs

@clux clux left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Sorry took me a while to get to it.
Left a few comments on some documentation choices, but otherwise the wrapping function here is a very nice solution.

Signed-off-by: doxxx93 <doxxx93@gmail.com>
@clux clux merged commit 8cb0d01 into kube-rs:main Mar 27, 2026
19 checks passed
atilladeniz added a commit to atilladeniz/Kubeli that referenced this pull request Apr 26, 2026
Closes #292.

The shared kube-rs client set `read_timeout = 30s`, which kube-client forwards to hyper-timeout's per-read timer on the underlying I/O. Any long-lived idle stream (`pods.log_stream`, `pods.exec`) was therefore torn down after ~30s of silence and surfaced in the UI as "ServiceError: error reading a body from connection".

Leave `read_timeout` and `write_timeout` unset on the shared client so streams stay open until the user, the pod, or the API server closes them. This matches the new kube-rs upstream default (PR kube-rs/kube#1945) and the behavior of kubectl / the Go client. `connect_timeout = 10s` is preserved so bad clusters still fail fast at connection time. Per-request deadlines for non-streaming calls already live at the call site via `tokio::time::timeout` (see `test_connection`).

Apply the same policy in both client paths:
- `init_with_context`, used by the Tauri app via `commands/clusters.rs`
- `init`, used by the MCP server via `mcp/server.rs::connect_to_cluster`

A previously-stale `#[allow(dead_code)]` on `init` was removed; the function is in fact wired up through MCP. Adds a regression test (`shared_client_has_no_read_timeout`) that pins the timeout policy so a future change cannot silently reintroduce the bug.

Follow-up tracked in #296: replace the raw `ServiceError` banner with an inline "Stream ended" notice plus a Reconnect button, since the same error surface still fires for legitimate stream ends (pod terminated, kubelet idle timeout, LB cuts, etc.).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-change changelog change category for prs client kube Client related runtime controller runtime related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High default config.read_timeout delays client recovery

2 participants