Skip to content

Enable HTTP/2 support in kube-client#1823

Draft
frobware wants to merge 4 commits into
kube-rs:mainfrom
frobware:enable-http2
Draft

Enable HTTP/2 support in kube-client#1823
frobware wants to merge 4 commits into
kube-rs:mainfrom
frobware:enable-http2

Conversation

@frobware
Copy link
Copy Markdown

This change enables HTTP/2 support in the client, allowing multiple requests to multiplex over a single connection. The implementation simply adds the http2 feature flag to hyper and its related dependencies, then enables HTTP/2 alongside HTTP/1 in the rustls connector configuration.

I noticed that I had 400+ open connections to the API server when streaming logs from all pods in my cluster, each requiring its own TCP handshake and TLS negotiation. With HTTP/2 enabled, I now only have a single connection handling all those streams, eliminating the overhead and avoiding potential resource exhaustion from hitting connection limits.

@clux
Copy link
Copy Markdown
Member

clux commented Sep 11, 2025

wow, really?! i wasn't aware that the apiserver supported http2 for this, but sure enough, there are flags for it.

I do see it gives us some UpgradeConnection(ProtocolSwitch(500)) errors related to http upgrades for websocket related (exec/attach) stuff though 🤔

@frobware
Copy link
Copy Markdown
Author

I do see it gives us some UpgradeConnection(ProtocolSwitch(500)) errors related to http upgrades for websocket related (exec/attach) stuff though 🤔

I'll take a look over the regressions, but it won't be immediately.

@clux clux added the changelog-add changelog added category for prs label Sep 11, 2025
@clux
Copy link
Copy Markdown
Member

clux commented Sep 11, 2025

Appreciated. No rush.

From a quick googling around, this comment in client-go possibly implies some mutual exclusivity in usage, and perhaps we need to set Request::version_mut to override back to 1.1 for websocket stuff in kube-core's Request constructors. I could be wrong though.

@clux
Copy link
Copy Markdown
Member

clux commented Jan 21, 2026

hey there. do you think you'll have time to come back at this at some point? i can open an issue for someone else to tackle if not.

@frobware
Copy link
Copy Markdown
Author

hey there. do you think you'll have time to come back at this at some point? i can open an issue for someone else to tackle if not.

I’d like to say “yes”, but realistically it may be 1 to 2 months before I can get back to this.

For context: this work matters to me because I’ve been building a tool called kat (tailing Kubernetes pod logs). Its default behaviour is to tail all containers across all pods and namespaces, and without connection reuse (HTTP/2) it’s easy to run into file-descriptor limits on some clusters I connect to. So I definitely care about getting HTTP/2 right here - I’m just not sure when I’ll be able to actively work on it again.

If someone else is able to pick this up in the meantime, that would be hugely appreciated. If not, I may be able to return to it towards late March / April.

@clux
Copy link
Copy Markdown
Member

clux commented Jan 29, 2026

Thanks for the response. I have opened #1923 for now to let people experiment if the interest in there.

@clux clux marked this pull request as draft January 29, 2026 18:48
@uchiha-vivek
Copy link
Copy Markdown
Contributor

@clux I will try to explore it after going through all the above comments.
Will brainstorm in #1923

@frobware
Copy link
Copy Markdown
Author

I hope to start looking at this again in the next few weeks. 🤞

Add http2 feature to hyper, hyper-util, and hyper-rustls to enable
HTTP/2 ALPN support in the kube-client. This complements the
enable_http2() API call to provide full HTTP/2 multiplexing
capability for Kubernetes API connections.

Signed-off-by: Andrew McDermott <aim@frobware.com>
This change adds HTTP/2 ALPN support to the rustls HTTPS connector by
calling enable_http2() in addition to enable_http1(). This enables
HTTP/2 multiplexing for Kubernetes API connections, reducing the
number of TCP connections from potentially hundreds to a single
multiplexed connection per API server.

Previously, kube-rs only supported HTTP/1.1 connections, resulting in
a new TCP connection for each concurrent API request. With HTTP/2
enabled, multiple requests can be multiplexed over a single connection,
improving performance and reducing resource usage.

Signed-off-by: Andrew McDermott <aim@frobware.com>
@frobware
Copy link
Copy Markdown
Author

Sorry for the silence here -- finally came back to it. The branch is rebased on main, and I've pushed a stab at the routing split that #1923 hinted at (commit c64e575).

The shape is two transports inside Client. The primary one is h2-capable and carries TokioTimer plus HTTP/2 keep-alive PINGs (interval 30s, while-idle on), which is the bit that should actually help the HAProxy / 35s idle-timeout case. The secondary one is HTTP/1.1-only and is wired into Client::connect(), so exec/attach/port-forward stop being attempted over h2 -- that's what was producing the UpgradeConnection(ProtocolSwitch(...)) failures the integration job was hitting.

One thing that bit me along the way: hyper-rustls' enable_http1()-only builder leaves alpn_protocols empty, which silently lets the server pick h2 anyway. The h1 sibling has to advertise http/1.1 explicitly. Rather than push for an upstream change in hyper-rustls (and to keep Config::tls_server_name honoured on the upgrade path), I ended up shipping a small H1OnlyHttpsConnector in kube-client/src/client/tls.rs that mirrors hyper-rustls' TCP-then-TLS dance over public types. tokio-rustls becomes a direct dep gated on rustls-tls (already in the dep tree transitively).

openssl ALPN parity I've left for a follow-up -- openssl was already h1-in-practice (no ALPN at all), so it's not touched here.

One breaking change worth calling out: Config gains a public disable_http2: bool runtime escape hatch (default false, so HTTP/2 stays on). It matches the existing disable_compression style, but it's an exhaustive-struct-literal break for downstream code that constructs Config by hand. I don't have any desire to break the API -- I just wanted to see if this approach could pass CI. If reviewers would rather, I'm happy to move the flag onto ClientBuilder instead -- that's genuinely non-breaking, it just doesn't thread through Config::infer cleanly.

Important caveat: this has had genuinely minimal exposure. It is largely a stab in the dark to see if I could get CI green. Locally I've run the integration suite, the upgrade-path examples (pod_exec, pod_attach, pod_portforward*, pod_shell*, log_stream), and a cargo hack feature powerset -- all clean. But I haven't even pointed my own kat log streamer at it (the thing that made me care about h2 in the first place), and nothing here exercises the long-running watch behind HAProxy from #1923.

The previous commit on this branch enabled HTTP/2 on the rustls
connector by switching ALPN from `enable_http1()` alone to
`enable_http1().enable_http2()`.  REST traffic, watch streams and
log streaming benefit from HTTP/2 multiplexing, but exec, attach
and port-forward use HTTP/1.1 connection upgrades, which are
unrepresentable on an HTTP/2 connection.  After ALPN negotiates
`h2`, the apiserver answered upgrade requests with 4xx/5xx and the
integration suite started failing with
`UpgradeConnection(ProtocolSwitch(...))`.

Fix this by giving `Client` two transports with separate connection
pools and ALPN policies:

* a primary, h2-capable transport for normal traffic, configured
  with `TokioTimer` and HTTP/2 keep-alive PINGs (interval 30s,
  while-idle on) so watch streams survive idle-killing
  intermediaries such as HAProxy;
* an HTTP/1.1-only transport used by `Client::connect()`, the only
  caller of `hyper::upgrade::on` in the workspace, hard-routed
  there via a new `upgrade_inner` field on `Client`.

A subtle point governs the h1-only path on rustls.  hyper-rustls'
builder asserts that `alpn_protocols.is_empty()` when accepting a
rustls config, and only `enable_http2()` populates the list.  The
`enable_http1()`-only builder path therefore leaves ALPN empty,
which means no ALPN extension is sent on the wire and a modern
apiserver may still negotiate HTTP/2 -- the very condition we are
trying to avoid.  The h1-only sibling must advertise `http/1.1`
*explicitly*.

The natural workaround -- `HttpsConnector::from((http,
Arc::new(rustls_config)))` with `alpn_protocols =
vec![b"http/1.1".to_vec()]` -- bypasses the builder's assertion but
also drops `Config::tls_server_name`, because the `From` impl
constructs the connector with the default server-name resolver and
the resolver field is private.  Rather than depend on an upstream
hyper-rustls change, ship a small `H1OnlyHttpsConnector<H>` in
`kube-client/src/client/tls.rs` that mirrors the public TCP-then-TLS
dance from `hyper_rustls::HttpsConnector::call`, sets the explicit
ALPN advertisement, and resolves SNI via `Config::tls_server_name`
when set or from the URI host otherwise.  This adds `tokio-rustls`
as a direct dep gated on `rustls-tls` (already in the dep tree
transitively via hyper-rustls).

Other invariants worth preserving:

* `Config::auth_layer()` and `Config::extra_headers_layer()` are
  computed once and *cloned* into both transport stacks.  Calling
  `auth_layer()` twice would mint independent `RefreshableToken`
  state, so each transport would refresh tokens on its own and
  could diverge under exec-plugin or token-file auth.  `AuthLayer`
  gains `#[derive(Clone)]` for this; the inner `Either` was
  already trivially cloneable.

* `Config` gains a public `disable_http2: bool` field as a runtime
  escape hatch.  When set, the primary transport falls back to the
  h1-only client; the two-client shape stays the same so the upgrade
  path is unaffected.  HTTP/2 stays on by default; the field
  matches the existing `disable_compression` style.

* `ClientBuilder` grows `with_upgrade_service`, paired with a new
  `Client::new_with_upgrade` constructor for custom-service users
  who supply their own service stack and need to opt in to the
  split.  The single-service `Client::new` keeps its signature; it
  internally clones the buffered handle into `upgrade_inner` so
  back-compat is preserved.

Verified against k3d v1.34.1 with the integration suite, the
`pod_exec`, `pod_attach`, `pod_portforward*`, `pod_shell*` and
`log_stream` examples, and a full 1200-combo `cargo hack` feature
powerset.

Signed-off-by: Andrew McDermott <aim@frobware.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 75.15924% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.5%. Comparing base (819d08a) to head (59ea267).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
kube-client/src/client/builder.rs 77.8% 14 Missing ⚠️
kube-client/src/client/tls.rs 72.1% 12 Missing ⚠️
kube-client/src/client/config_ext.rs 57.2% 9 Missing ⚠️
kube-client/src/client/mod.rs 86.7% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1823     +/-   ##
=======================================
+ Coverage   76.5%   76.5%   +0.1%     
=======================================
  Files         89      89             
  Lines       8658    8772    +114     
=======================================
+ Hits        6616    6704     +88     
- Misses      2042    2068     +26     
Files with missing lines Coverage Δ
kube-client/src/client/middleware/mod.rs 93.4% <ø> (ø)
kube-client/src/config/mod.rs 54.6% <ø> (ø)
kube-client/src/client/mod.rs 79.5% <86.7%> (+2.0%) ⬆️
kube-client/src/client/config_ext.rs 54.1% <57.2%> (+1.8%) ⬆️
kube-client/src/client/tls.rs 82.6% <72.1%> (-3.4%) ⬇️
kube-client/src/client/builder.rs 71.5% <77.8%> (+1.8%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@frobware
Copy link
Copy Markdown
Author

frobware commented May 1, 2026

Pointed kat (built against this branch) at an OpenShift apiserver overnight, about 10 hours. kat is the log streamer this PR is partly aimed at. Around 890k log lines streamed in the window. Connection count was one at start of the run, five when I next looked this morning, and three a few minutes later -- not a continuous trace, but the working set is clearly small and multiplexed. One of the connections from the morning snapshot has stayed up across the whole period (same source port from start to finish). Pre-PR I'd have been opening a fresh TCP+TLS handshake per pod-container as they churned, which was the original itch.

Worth being precise about the topology. There are two HAProxies in the cluster but only one is in kat's data path: the external lbx-ocp-api LB does TCP pass-through with SNI-based routing across the three masters, and TLS/ALPN terminate on the apiserver itself. The relevant config:

defaults
  log global
  option dontlognull
  timeout connect 10s
  timeout client 1h
  timeout server 1h
  timeout tunnel 1h

frontend fe_kubeapi
  mode tcp
  bind :6443
  tcp-request inspect-delay 5s
  tcp-request content accept if { req.ssl_hello_type 1 }
  # SNI-based routing to per-cluster backends

So this run doesn't exercise an L7 HAProxy with an aggressive timeout client -- the in-cluster OpenShift router does have those (timeout client 30s, mode http) but it fronts application Routes, not the apiserver, so kat doesn't transit it. The lukasstockner / #1923 case still hasn't been exercised end-to-end.

It's a useful data point that the multiplexing savings hold for hours of real traffic, with no errors or stalls. For an extra poke at robustness I SIGSTOP'd the process for a while to let the kernel buffer up, then fg'd it; hyper drained the backlog and carried on without complaint.

A separate thing worth discussing: the keep-alive interval is currently hard-coded to 30s in build_main_service. Against a 30s timeout client the PING has no margin; against the 1h timeouts in my LB it's wasteful chatter. There's probably a case for making it a knob on Config (default 30s, None to disable), same shape as disable_http2 -- but happy to defer that conversation for the moment.

@frobware
Copy link
Copy Markdown
Author

frobware commented May 1, 2026

A bit of belt-and-braces while I'm here -- corroborating the apiserver's protocol behaviour with curl, independent of any kube-rs code, so the assumptions the PR rests on are visible. Two probes against the same external HAProxy LB: once with the per-cluster apiserver hostname (the path kat actually takes) and once with the LB's own hostname (to demonstrate the SNI policy in the rejection direction).

Probe 1: per-cluster apiserver via the LB

curl -sk <opts> -o /dev/null -w '%{http_version} %{http_code}' https://api.ocp421.int.frobware.com:6443/version

Curl options What curl offers in ALPN Expectation Observed
--http2 h2,http/1.1 server picks h2 http_version=2 http_code=200
--http1.1 http/1.1 server picks http/1.1 http_version=1.1 http_code=200
--http2-prior-knowledge h2 server picks h2 http_version=2 http_code=200

These three rows mirror the two ALPN advertisements this PR makes (h2,http/1.1 on the main transport, http/1.1 on the upgrade transport) plus the h2-only case for completeness. The apiserver picks h2 when offered, falls back to h1 cleanly when only h1 is offered. That is the whole mechanism the dual-transport split depends on.

Verbose for the --http2 row to show ALPN explicitly:

* ALPN: curl offers h2,http/1.1
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / X25519MLKEM768 / RSASSA-PSS
* ALPN: server accepted h2
* using HTTP/2
> GET /version HTTP/2
> Host: api.ocp421.int.frobware.com:6443
< HTTP/2 200
< audit-id: ...
< content-type: application/json

Probe 2: the LB hostname directly (negative test)

curl -sk --http2 https://lbx-ocp-api.int.frobware.com:6443/version

Curl options Expectation Observed
--http2 TLS handshake rejected; SNI not in routing map unexpected eof while reading (TLS decode error)
--http1.1 same http_version=0 http_code=000
--http2-prior-knowledge same http_version=0 http_code=000

Verbose for --http2:

*   Trying 192.168.7.43:6443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (OUT), TLS alert, decode error (562):
* TLS connect error: error:0A000126:SSL routines::unexpected eof while reading
* closing connection #0

The HAProxy SNI map only forwards traffic for known per-cluster SNIs (api.ocp420.*, api.ocp421.*, api.ocp422.*); anything else hits bk_reject_tcp. So the LB enforces SNI policy at TCP/TLS, before any HTTP-level negotiation has happened. Worth knowing: if a kube-rs caller against this kind of LB sets the wrong tls_server_name (or none, when one is required), the failure is at TLS handshake, not at the HTTP layer -- which is exactly where the in-tree H1OnlyHttpsConnector honours Config::tls_server_name.

Signed-off-by: Eirik A <sszynrae@gmail.com>
@clux
Copy link
Copy Markdown
Member

clux commented May 5, 2026

Thanks a lot for this. I think it looks very reasonable at a high level view. Will have to dig in a bit more on some of these bits (as time allows).

One breaking change worth calling out: Config gains a public disable_http2: bool runtime escape hatch (default false, so HTTP/2 stays on). It matches the existing disable_compression style, but it's an exhaustive-struct-literal break for downstream code that constructs Config by hand. I don't have any desire to break the API -- I just wanted to see if this approach could pass CI. If reviewers would rather, I'm happy to move the flag onto ClientBuilder instead -- that's genuinely non-breaking, it just doesn't thread through Config::infer cleanly.

Not a problem. A #[non_exhaustive] attr was added to Config for 4.0 anyway in 819d08a. Which is helpful now because I expect we might not have enough time to test this properly for 4.0, but if this is the only breaking change, we can in theory add it in 4.1 if it's deemed safe.

Have resolved some tiny Cargo.toml conflicts on your branch as a result of 819d08a. (feel free to over smash over it, but this stuff is squash-merged at the end anyway)

Copy link
Copy Markdown
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick comment on timeouts

Comment on lines +228 to +231
// transports. Calling `auth_layer()` twice would mint independent
// `RefreshableToken` state per transport, so each path would refresh
// tokens on its own and they'd diverge under exec-plugin or token-file
// auth.
Copy link
Copy Markdown
Member

@clux clux May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really appreciate these detailed comments here.

helps the complexity go down a little more easily.

let mut connector = TimeoutConnector::new(connector);
connector.set_connect_timeout(config.connect_timeout);
connector.set_read_timeout(config.read_timeout);
connector.set_write_timeout(config.write_timeout);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write timeout here is default 295s which is a number very specific to long watches. I am not sure it makes sense for websocket calls. But I don't have a good recommendation either yet.

@frobware
Copy link
Copy Markdown
Author

frobware commented May 5, 2026

I started on

commit d9f74781aec98413f32df48e6bfb1b701072e580 (HEAD -> enable-http2)
Author: Andrew McDermott <aim@frobware.com>
Date:   Thu Apr 30 23:28:07 2026 +0100

    Add openssl ALPN parity for the dual-transport split

    The previous commit gave the rustls path two transports with explicit
    ALPN advertisements but left openssl untouched: openssl users got the
    existing no-ALPN-default behaviour on both transports, which meant no
    HTTP/2 benefit on the main path and only accidental HTTP/1.1 on the
    upgrade path.

    Bring openssl to parity:

but haven't pushed the commit yet as a) it was getting late and b) ZERO testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-add changelog added category for prs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants