Connection rotation / max lifetime

We run a lot of bursty applications on Kubernetes. They tend to look like 1–3 pods of a low-replica coordinator and 4→128 worker pods (depending on load). A problem I've consistently encountered across systems, made worse by this setup, is that connection pools don't respond to scale-up events: the connection pools (e.g. `reqwest`) on the controller pods keep sending work to the same initial 4 worker pods even when those are saturated and another 124 pods are idle waiting for work. To get work to the new pods we'd need to establish *new* connections so that the k8s Service round-robins them across the full pod set.

Unfortunately most HTTP clients don't have any sort of connection rotation feature. But servers *can* pressure clients to rotate connections — or force them to — by closing connections after some bounded lifetime. This works for both HTTP/1 (`Connection: close` on the next response, then close) and HTTP/2 / gRPC (`GOAWAY` with a grace period for in-flight streams).

I'll open a sibling issue against `reqwest`, but I think it's valuable to have this option on the server as well:

1. **Belt and suspenders / defense in depth.** Even if the client supports rotation, having the server enforce it bounds the worst case.
2. **The author of the server is not always the author of the client.** The client's connection pool may not have these options, or the client's author may not be aware of how the server / infra works.

Database connection pools have long had similar options ([`max_lifetime` in sqlx](https://docs.rs/sqlx/latest/sqlx/pool/struct.PoolOptions.html#method.max_lifetime)). On the gRPC side, **tonic already exposes exactly this** via [`Server::max_connection_age`](https://docs.rs/tonic/latest/tonic/transport/struct.Server.html#method.max_connection_age) (plus `max_connection_age_grace` and `max_connection_idle`) — so there's clear precedent for this feature living in an axum-/hyper-adjacent server crate. Envoy has `max_connection_duration` for the same reason.

## Proposed API

```rust
use std::time::Duration;
use axum_server::ConnectionLimits;

let limits = ConnectionLimits::default()
    // Soft cap on total connection lifetime. When reached:
    //   - HTTP/1: set `Connection: close` on the next response, then close
    //     once the current request finishes.
    //   - HTTP/2: send GOAWAY; new streams are refused, existing streams
    //     get up to `max_connection_age_grace` to finish.
    .max_connection_age(Duration::from_secs(10 * 60))

    // Per-connection random jitter added to `max_connection_age`. Critical
    // for avoiding synchronized reconnect storms when many connections
    // were established at the same time (e.g. right after a deploy).
    .max_connection_age_jitter(Duration::from_secs(60))

    // Hard cap on how long to wait for in-flight work after the soft age
    // limit fires before forcibly closing. Mirrors tonic's grace.
    .max_connection_age_grace(Duration::from_secs(30))

    // Optional idle timeout — close connections that have been quiet for
    // this long.
    .max_connection_idle(Duration::from_secs(5 * 60));

axum_server::bind(addr)
    .connection_limits(limits)
    .serve(app.into_make_service())
    .await?;
```

This works for HTTP/1 and HTTP/2 (including gRPC) — the mechanism differs but the user-facing knob is the same.

## Design notes / open questions

- **Naming.** I've cribbed from tonic intentionally so the vocabulary is familiar to anyone who has tuned a gRPC server.
- **Builder method vs. config struct.** I'd lean config struct since there are already ~4 related knobs and more (per-connection request counts, etc.) will likely follow, but a flat builder works too.
- **Jitter as a first-class knob.** Without it, every connection opened in the same second tears down in the same second N minutes later — exactly the thundering herd we're trying to avoid. Worth making it hard to forget.
- **`max_requests_per_connection`?** Tempting, but interacts awkwardly with HTTP/2 multiplexing (do streams count, or just requests? what about long-lived server-streamed responses?). I'd propose age-based first and leave count-based as a follow-up unless there's a concrete need.
- **Relationship to graceful shutdown.** The grace-period machinery overlaps a lot with what `Handle::graceful_shutdown` already does per-server; an implementation could likely share most of it, just scoped per-connection.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection rotation / max lifetime #3753

Proposed API

Design notes / open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Connection rotation / max lifetime #3753

Description

Proposed API

Design notes / open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions