Skip to content

[Cosmos] Support configurable max retry attempts/wait time for rate-limited (HTTP 429) requests #4543

@kundadebdatta

Description

@kundadebdatta

Support configurable max retry attempts/wait time for rate-limited (HTTP 429) requests

Summary

The Rust Cosmos DB driver and SDK should let callers configure how many times a throttled (HTTP 429, rate-limited) request is retried, and how long retrying may take in aggregate — matching the .NET SDK's CosmosClientOptions.MaxRetryAttemptsOnRateLimitedRequests and MaxRetryWaitTimeOnRateLimitedRequests.

Today the throttle retry budget is hard-coded (9 attempts / 30s), with no way to tune it per-request or per-client.

Motivation

  • .NET parity — applications migrating from the .NET SDK expect to control 429 retry behavior.
  • Latency-sensitive workloads — some callers want to fail fast (0 retries) and handle 429s themselves rather than absorbing up to 30s of internal backoff.
  • Bulk/ingestion workloads — others want to extend the retry budget.

Proposed API

Driver (azure_data_cosmos_driver) — OperationOptions

Field Type Env var Default Notes
max_throttle_retry_count Option<u32> AZURE_COSMOS_MAX_THROTTLE_RETRY_COUNT 9 0 disables throttle retries (first 429 surfaces to caller)
max_throttle_retry_wait_time Option<Duration> — (none) 30s No env var — Duration is not FromStr-parseable

Resolves through the standard operation → account → runtime → env layering and is threaded into the transport-level throttle retry loop.

SDK (azure_data_cosmos)

  • Per-request: settable via the re-exported OperationOptions / OperationOptionsBuilder.

  • Client-wide: new CosmosClientBuilder methods

    • with_max_retry_attempts_on_throttled_requests(u32)
    • with_max_retry_wait_time_on_throttled_requests(Duration)

    forwarded to the driver's runtime layer at build() time.

Behavior notes

  • max_throttle_retry_count = 0 → the first 429 propagates with no retry.
  • max_throttle_retry_wait_time caps cumulative backoff; once the next delay would exceed the budget, the 429 propagates.
  • The service x-ms-retry-after-ms header is still honored (and capped per-retry) within these limits.
  • This is the transport-level 429 retry loop — distinct from the cross-region failover classifier that treats 429/3092 SystemResourceUnavailable as a failover signal.

Acceptance criteria

  • Driver OperationOptions exposes both fields with documented defaults and env var.
  • Throttle retry loop honors configured count + wait time; 0 disables retries.
  • SDK exposes per-request override and client-wide builder methods.
  • Unit tests: disable (0), custom count cap, custom wait-time cap, option round-trip + env-var parsing, SDK builder wiring.
  • Docs (ConfigurationOptions.md) and CHANGELOG entries for both crates.

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions