Skip to content

estuary-cdk: improve HTTP connection timeout retry logic#3067

Merged
Alex-Bair merged 1 commit intomainfrom
bair/estuary-cdk-improve-retry-logic
Jul 17, 2025
Merged

estuary-cdk: improve HTTP connection timeout retry logic#3067
Alex-Bair merged 1 commit intomainfrom
bair/estuary-cdk-improve-retry-logic

Conversation

@Alex-Bair
Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair commented Jul 17, 2025

Description:

My earlier attempt at retrying connection timeout errors in #3065 wasn't thorough enough. Connectors still encountered ConnectionTimeoutErrors when establishing the connection in request_stream and request_lines, and asyncio.TimeoutErrors when reading the response body in request. After digging into the specific errors further, I learned that:

  • aiohttp.client_exceptions.ConnectionTimeoutError is raised when establishing the connection. It is not raised after the connection is established and we've begun receiving chunks from the server.
  • asyncio.TimeoutError is raised when the aiohttp client timeout is reached after we've started reading chunks of the body but before we've received the final chunk.

With this knowledge, my previous assertion in e7c8c7d about it not being safe to retry timeout errors in request_stream and request_lines isn't precisely right. When we want to stream a response body, it's safe to retry timeouts when establishing the connection but not once we return the headers and body generator. Since aiohttp.client_excepitons.ConnectionTimeoutError is a subclass of asyncio.TimeoutError, and request_stream/request_lines establish the HTTP connection but don't iterate through the body, it's safe to use a general _retry_on_timeout wrapper for all of HTTPSession's public methods to retry any asyncio.TimeoutErrors. This means that all of the public methods retry connection establishment errors, but only request retries timeout errors when processing the response body.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested behavior with a simple script. Confirmed:

  • All HTTP methods retry aiohttp.client_exceptions.ConnectionTimeoutErrors and asyncio.TimeoutErrors before consuming the body.
  • Only the request method retries asyncio.TimeoutErrors while reading the response body.
  • Retries are attempted up to 3 times, then errors are bubbled up if there are 4 consecutive timeout errors.

I also cleaned up a couple unused imports & variables in http.py while I was in there.


This change is Reviewable

My earlier attempt at retrying connection timeout errors wasn't thorough
enough. Connectors still encountered connection timeout errors when
establishing the connection in `request_stream` and `request_lines`, and
also when processing the response in `request`. After digging into the
specific errors further, I learned that:
- `aiohttp.client_exceptions.ConnectionTimeoutError` is raised when
  _establishing_ the connection. It is not raised after the connection
  is established and we've begun receiving chunks from the server.
- `asyncio.TimeoutError` is raised when the aiohttp client timeout is
  reached after we've started receiving chunks from the server but
  before we've received the final chunk.

With this knowledge, my previous assertion in
e7c8c7d about it not being safe to
retry timeout errors in `request_stream` and `request_lines` isn't precisely
right. It's safe to retry timeouts when establishing the connection, but
not once we return the headers and body generator. Since
`aiohttp.client_excepitons.ConnectionTimeoutError` is a subclass of
`asyncio.TimeoutError` and `request_stream`/`request_lines` establish
the HTTP connection but don't process the body (that's the
responsibility of individual connectors), it's safe to use a general
`_retry_on_timeout` wrapper for all of HTTPSession's public methods to
retry any `asyncio.TimeoutError`s. This means that all of the public
methods retry connection establishment errors, but only `request`
retries timeout errors when processing the response body.

Co-Authored-By: Claude <noreply@anthropic.com>
@Alex-Bair Alex-Bair marked this pull request as ready for review July 17, 2025 18:46
@Alex-Bair Alex-Bair requested a review from williamhbaker July 17, 2025 18:47
Copy link
Copy Markdown
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alex-Bair Alex-Bair merged commit 02a20fe into main Jul 17, 2025
97 of 104 checks passed
@Alex-Bair Alex-Bair deleted the bair/estuary-cdk-improve-retry-logic branch July 17, 2025 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants