Connection rotation / max lifetime

We run a lot of bursty applications on Kubernetes. They tend to look like 1–3 pods of a low-replica coordinator and 4→128 worker pods (depending on load). A problem I've consistently encountered across systems, made worse by this setup, is that connection pools don't respond to scale-up events: the `reqwest` pool on the controller pods keeps sending work to the same initial 4 worker pods even when those are saturated and another 124 pods are idle waiting for work.

The reason is that DNS resolution happens once per connection establishment. The k8s Service round-robins pod IPs across DNS responses, but as long as the existing connections stay alive and keep getting reused out of the pool, the client never asks DNS again and never sees the new pods. To redistribute load we need to *close* connections periodically so that new ones get established against freshly resolved (and round-robined) endpoints.

There are two complementary places to fix this:

1. **Server side** — the server pressures or forces clients to rotate (HTTP/1 `Connection: close`, HTTP/2 `GOAWAY`). I've opened a sibling issue against `axum-server` for that. Tonic already supports this via [`Server::max_connection_age`](https://docs.rs/tonic/latest/tonic/transport/struct.Server.html#method.max_connection_age).
2. **Client side** — the client retires pooled connections after a bounded lifetime or request count, even if the server hasn't asked it to.

Both are useful and complementary. The server-side fix only works if you control the server and it has the feature. The client-side fix is the one you reach for when you're talking to a server you don't control (or one that doesn't bother). Database connection pools have had this for years — see [`max_lifetime` in sqlx](https://docs.rs/sqlx/latest/sqlx/pool/struct.PoolOptions.html#method.max_lifetime).

`reqwest` today exposes `pool_idle_timeout` and `pool_max_idle_per_host`, but neither of these helps if the connections are actively being used — which is exactly the case where load redistribution matters most.

## Proposed API

```rust
use std::time::Duration;

let client = reqwest::Client::builder()
    // Soft cap on total connection lifetime. Once a pooled connection
    // reaches this age, it is not returned to the pool after its current
    // request completes; it is closed instead. This forces the next
    // request to that host to re-resolve DNS and establish a new
    // connection (which is the mechanism that lets k8s Service
    // round-robin route traffic to newly scaled-up pods).
    .pool_max_connection_age(Duration::from_secs(10 * 60))

    // Per-connection random jitter added to `pool_max_connection_age`.
    // Without jitter, a burst of connections established at the same
    // moment all expire at the same moment, producing a synchronized
    // reconnect storm.
    .pool_max_connection_age_jitter(Duration::from_secs(60))

    // Optional: retire a connection after it has served this many
    // requests, regardless of age. Useful when traffic is uneven —
    // an age cap alone may not rotate a heavily-used connection
    // often enough.
    .pool_max_requests_per_connection(10_000)

    .build()?;
```

## Design notes / open questions

- **Naming.** I've prefixed with `pool_` to match the existing `pool_idle_timeout` / `pool_max_idle_per_host` naming.
- **Where to enforce.** The natural place is at pool checkout/checkin: when a connection comes back to the pool, check its age and request count and drop it instead of reinserting. This avoids interrupting in-flight requests and keeps the implementation simple. It may belong upstream in `hyper-util`'s pool rather than purely in reqwest — happy to file there instead / additionally if that's preferred.
- **HTTP/2 multiplexing.** For HTTP/2 there's typically one connection per host carrying many concurrent streams. The age cap still works (close the connection once all streams finish, or after a grace period), but `pool_max_requests_per_connection` is more ambiguous — do we count streams? I'd suggest "yes, streams count as requests" and call it out in the doc, but it's worth a design decision.
- **Interaction with `pool_idle_timeout`.** These are orthogonal: idle timeout retires unused connections, age cap retires *used* ones. Both should be settable independently.
- **Defaults.** I'd leave all of these `None` by default to preserve current behavior. Opt-in only.
- **Relationship to retries / failures.** Closing a connection at the pool layer should be invisible to callers — the next request just establishes a new one. No retry semantics needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection rotation / max lifetime #3033

Proposed API

Design notes / open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Connection rotation / max lifetime #3033

Description

Proposed API

Design notes / open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions