Summary
S3Client::put_object().send().await can appear to hang when the same client is used for get_object, then a long period of local work (no HTTP I/O), then put_object to the same endpoint. The future may not complete until an operation-level timeout fires (e.g. TimeoutConfig::operation_timeout), with errors like request has timed out.
This is most plausibly explained by HTTP connection pooling: the client reuses an idle pooled connection that the peer or path has already closed (or half-closed), while the pool’s idle retention (Hyper’s default is 90s when pool_timer is configured) may still treat the connection as reusable. That is a client lifecycle / default tuning issue relative to server-side idle behavior, not necessarily incorrect PutObject logic.
get_object can succeed; the problematic phase is reuse after a gap.
Likely classification
- Application configuration: Workloads with large gaps between sequential calls on one
S3Client should consider shorter pool_idle_timeout on a custom HTTP connector, operation timeouts, a fresh client (or connector) for the upload phase, or not sharing one client across long CPU-bound sections.
- Documentation gap: The AWS Rust SDK could call out this pattern explicitly (long idle between requests → tune pool idle or avoid reuse).
- Optional product change: Whether default pool idle timeout should be more conservative than Hyper’s 90s is a tradeoff (fewer dead reuse attempts vs more connection churn) for maintainers to decide—not a given “bug fix.”
Environment
- Runtime: Linux x86_64 (Docker: CUDA base image + Rust binary from
rust:1-bookworm)
- Region:
us-west-2
- Credentials: IAM task role (AWS Batch on EC2)
- Dependencies observed:
aws-sdk-s3 1.126.x / 1.127.x when unconstrained; hang also seen with aws-sdk-s3 pinned <1.127 (1.126.x). Client from aws_config::defaults(BehaviorVersion::latest()).load() + aws_sdk_s3::Client::new(&cfg).
An older production image built around 2026-03-11 (older resolved SDK versions) did not show this for the same application pattern.
Reproduction sketch
- Single
aws_sdk_s3::Client from default config.
get_object, consume body (success).
- ~30+ seconds (or longer) of CPU-heavy work without using the S3 client.
put_object with ByteStream::from_path(...) and .send().await.
Observed: .await blocks for a long time; with TimeoutConfig::operation_timeout (e.g. 300s), request has timed out.
Expected (from an app perspective): Either successful upload, or a fast, clear connection error and retry on a new connection—without requiring every user to discover pool behavior by production incident.
Additional observations
aws s3 cp to the same bucket from the same task succeeds quickly, consistent with fresh connection / different stack, not IAM or S3 outage.
- Pinning
aws-sdk-s3 < 1.127 alone did not resolve the stall in our tests—consistent with the issue being below the S3 crate (HTTP pool / hyper), not a single crate version regression.
- Mitigations that worked for us:
TimeoutConfig on the config loader; performing uploads via aws s3 cp subprocess; optionally new client for upload phase.
Suggested follow-ups for maintainers
- Docs: Add guidance for long idle gaps between calls on one client (tune
pool_idle_timeout via custom HTTP client, or separate clients / phases).
- Examples: Optional example of
aws_config + custom aws_smithy_http_client::Connector with explicit pool_idle_timeout.
- Defaults: Evaluate whether Smithy’s default when
pool_idle_timeout is unset should differ from Hyper’s 90s (tradeoff vs dead reuse)—not assumed without measurement.
References
- Related prototype note (default pool idle): discussion in this thread.
- Internal context: OCR pipeline — download PDF → GPU/CPU processing → upload SQLite + PDF.
Thank you for considering documentation and defaults; we’re happy to help validate doc changes or a minimal repro crate if useful.
Summary
S3Client::put_object().send().awaitcan appear to hang when the same client is used forget_object, then a long period of local work (no HTTP I/O), thenput_objectto the same endpoint. The future may not complete until an operation-level timeout fires (e.g.TimeoutConfig::operation_timeout), with errors like request has timed out.This is most plausibly explained by HTTP connection pooling: the client reuses an idle pooled connection that the peer or path has already closed (or half-closed), while the pool’s idle retention (Hyper’s default is 90s when
pool_timeris configured) may still treat the connection as reusable. That is a client lifecycle / default tuning issue relative to server-side idle behavior, not necessarily incorrectPutObjectlogic.get_objectcan succeed; the problematic phase is reuse after a gap.Likely classification
S3Clientshould consider shorterpool_idle_timeouton a custom HTTP connector, operation timeouts, a fresh client (or connector) for the upload phase, or not sharing one client across long CPU-bound sections.Environment
rust:1-bookworm)us-west-2aws-sdk-s31.126.x / 1.127.x when unconstrained; hang also seen withaws-sdk-s3pinned<1.127(1.126.x). Client fromaws_config::defaults(BehaviorVersion::latest()).load()+aws_sdk_s3::Client::new(&cfg).An older production image built around 2026-03-11 (older resolved SDK versions) did not show this for the same application pattern.
Reproduction sketch
aws_sdk_s3::Clientfrom default config.get_object, consume body (success).put_objectwithByteStream::from_path(...)and.send().await.Observed:
.awaitblocks for a long time; withTimeoutConfig::operation_timeout(e.g. 300s), request has timed out.Expected (from an app perspective): Either successful upload, or a fast, clear connection error and retry on a new connection—without requiring every user to discover pool behavior by production incident.
Additional observations
aws s3 cpto the same bucket from the same task succeeds quickly, consistent with fresh connection / different stack, not IAM or S3 outage.aws-sdk-s3< 1.127 alone did not resolve the stall in our tests—consistent with the issue being below the S3 crate (HTTP pool / hyper), not a single crate version regression.TimeoutConfigon the config loader; performing uploads viaaws s3 cpsubprocess; optionally new client for upload phase.Suggested follow-ups for maintainers
pool_idle_timeoutvia custom HTTP client, or separate clients / phases).aws_config+ customaws_smithy_http_client::Connectorwith explicitpool_idle_timeout.pool_idle_timeoutis unset should differ from Hyper’s 90s (tradeoff vs dead reuse)—not assumed without measurement.References
Thank you for considering documentation and defaults; we’re happy to help validate doc changes or a minimal repro crate if useful.