Skip to content

S3 client can stall on PutObject after long idle gap (pooled connection reuse / configuration) #1417

@quinnjr

Description

@quinnjr

Summary

S3Client::put_object().send().await can appear to hang when the same client is used for get_object, then a long period of local work (no HTTP I/O), then put_object to the same endpoint. The future may not complete until an operation-level timeout fires (e.g. TimeoutConfig::operation_timeout), with errors like request has timed out.

This is most plausibly explained by HTTP connection pooling: the client reuses an idle pooled connection that the peer or path has already closed (or half-closed), while the pool’s idle retention (Hyper’s default is 90s when pool_timer is configured) may still treat the connection as reusable. That is a client lifecycle / default tuning issue relative to server-side idle behavior, not necessarily incorrect PutObject logic.

get_object can succeed; the problematic phase is reuse after a gap.

Likely classification

  • Application configuration: Workloads with large gaps between sequential calls on one S3Client should consider shorter pool_idle_timeout on a custom HTTP connector, operation timeouts, a fresh client (or connector) for the upload phase, or not sharing one client across long CPU-bound sections.
  • Documentation gap: The AWS Rust SDK could call out this pattern explicitly (long idle between requests → tune pool idle or avoid reuse).
  • Optional product change: Whether default pool idle timeout should be more conservative than Hyper’s 90s is a tradeoff (fewer dead reuse attempts vs more connection churn) for maintainers to decide—not a given “bug fix.”

Environment

  • Runtime: Linux x86_64 (Docker: CUDA base image + Rust binary from rust:1-bookworm)
  • Region: us-west-2
  • Credentials: IAM task role (AWS Batch on EC2)
  • Dependencies observed: aws-sdk-s3 1.126.x / 1.127.x when unconstrained; hang also seen with aws-sdk-s3 pinned <1.127 (1.126.x). Client from aws_config::defaults(BehaviorVersion::latest()).load() + aws_sdk_s3::Client::new(&cfg).

An older production image built around 2026-03-11 (older resolved SDK versions) did not show this for the same application pattern.

Reproduction sketch

  1. Single aws_sdk_s3::Client from default config.
  2. get_object, consume body (success).
  3. ~30+ seconds (or longer) of CPU-heavy work without using the S3 client.
  4. put_object with ByteStream::from_path(...) and .send().await.

Observed: .await blocks for a long time; with TimeoutConfig::operation_timeout (e.g. 300s), request has timed out.

Expected (from an app perspective): Either successful upload, or a fast, clear connection error and retry on a new connection—without requiring every user to discover pool behavior by production incident.

Additional observations

  • aws s3 cp to the same bucket from the same task succeeds quickly, consistent with fresh connection / different stack, not IAM or S3 outage.
  • Pinning aws-sdk-s3 < 1.127 alone did not resolve the stall in our tests—consistent with the issue being below the S3 crate (HTTP pool / hyper), not a single crate version regression.
  • Mitigations that worked for us: TimeoutConfig on the config loader; performing uploads via aws s3 cp subprocess; optionally new client for upload phase.

Suggested follow-ups for maintainers

  1. Docs: Add guidance for long idle gaps between calls on one client (tune pool_idle_timeout via custom HTTP client, or separate clients / phases).
  2. Examples: Optional example of aws_config + custom aws_smithy_http_client::Connector with explicit pool_idle_timeout.
  3. Defaults: Evaluate whether Smithy’s default when pool_idle_timeout is unset should differ from Hyper’s 90s (tradeoff vs dead reuse)—not assumed without measurement.

References

  • Related prototype note (default pool idle): discussion in this thread.
  • Internal context: OCR pipeline — download PDF → GPU/CPU processing → upload SQLite + PDF.

Thank you for considering documentation and defaults; we’re happy to help validate doc changes or a minimal repro crate if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions