Skip to content

Connection rotation / max lifetime #3753

@adriangb

Description

@adriangb

We run a lot of bursty applications on Kubernetes. They tend to look like 1–3 pods of a low-replica coordinator and 4→128 worker pods (depending on load). A problem I've consistently encountered across systems, made worse by this setup, is that connection pools don't respond to scale-up events: the connection pools (e.g. reqwest) on the controller pods keep sending work to the same initial 4 worker pods even when those are saturated and another 124 pods are idle waiting for work. To get work to the new pods we'd need to establish new connections so that the k8s Service round-robins them across the full pod set.

Unfortunately most HTTP clients don't have any sort of connection rotation feature. But servers can pressure clients to rotate connections — or force them to — by closing connections after some bounded lifetime. This works for both HTTP/1 (Connection: close on the next response, then close) and HTTP/2 / gRPC (GOAWAY with a grace period for in-flight streams).

I'll open a sibling issue against reqwest, but I think it's valuable to have this option on the server as well:

  1. Belt and suspenders / defense in depth. Even if the client supports rotation, having the server enforce it bounds the worst case.
  2. The author of the server is not always the author of the client. The client's connection pool may not have these options, or the client's author may not be aware of how the server / infra works.

Database connection pools have long had similar options (max_lifetime in sqlx). On the gRPC side, tonic already exposes exactly this via Server::max_connection_age (plus max_connection_age_grace and max_connection_idle) — so there's clear precedent for this feature living in an axum-/hyper-adjacent server crate. Envoy has max_connection_duration for the same reason.

Proposed API

use std::time::Duration;
use axum_server::ConnectionLimits;

let limits = ConnectionLimits::default()
    // Soft cap on total connection lifetime. When reached:
    //   - HTTP/1: set `Connection: close` on the next response, then close
    //     once the current request finishes.
    //   - HTTP/2: send GOAWAY; new streams are refused, existing streams
    //     get up to `max_connection_age_grace` to finish.
    .max_connection_age(Duration::from_secs(10 * 60))

    // Per-connection random jitter added to `max_connection_age`. Critical
    // for avoiding synchronized reconnect storms when many connections
    // were established at the same time (e.g. right after a deploy).
    .max_connection_age_jitter(Duration::from_secs(60))

    // Hard cap on how long to wait for in-flight work after the soft age
    // limit fires before forcibly closing. Mirrors tonic's grace.
    .max_connection_age_grace(Duration::from_secs(30))

    // Optional idle timeout — close connections that have been quiet for
    // this long.
    .max_connection_idle(Duration::from_secs(5 * 60));

axum_server::bind(addr)
    .connection_limits(limits)
    .serve(app.into_make_service())
    .await?;

This works for HTTP/1 and HTTP/2 (including gRPC) — the mechanism differs but the user-facing knob is the same.

Design notes / open questions

  • Naming. I've cribbed from tonic intentionally so the vocabulary is familiar to anyone who has tuned a gRPC server.
  • Builder method vs. config struct. I'd lean config struct since there are already ~4 related knobs and more (per-connection request counts, etc.) will likely follow, but a flat builder works too.
  • Jitter as a first-class knob. Without it, every connection opened in the same second tears down in the same second N minutes later — exactly the thundering herd we're trying to avoid. Worth making it hard to forget.
  • max_requests_per_connection? Tempting, but interacts awkwardly with HTTP/2 multiplexing (do streams count, or just requests? what about long-lived server-streamed responses?). I'd propose age-based first and leave count-based as a follow-up unless there's a concrete need.
  • Relationship to graceful shutdown. The grace-period machinery overlaps a lot with what Handle::graceful_shutdown already does per-server; an implementation could likely share most of it, just scoped per-connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-axumC-feature-requestCategory: A feature request, i.e: not implemented / a PR.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions