Skip to content

feat(openai): prevent silent streaming hangs in ChatOpenAI#36949

Merged
Mason Daugherty (mdrxy) merged 13 commits intomasterfrom
david/04-22/openai-streaming-reliability
Apr 23, 2026
Merged

feat(openai): prevent silent streaming hangs in ChatOpenAI#36949
Mason Daugherty (mdrxy) merged 13 commits intomasterfrom
david/04-22/openai-streaming-reliability

Conversation

@phvash
Copy link
Copy Markdown
Contributor

@phvash Asamu David (phvash) commented Apr 22, 2026

Important

Behavior change on upgrade — minor bump (1.1.161.2.0).

Streaming calls now raise StreamChunkTimeoutError (a TimeoutError subclass — existing except TimeoutError: / except asyncio.TimeoutError: handlers catch it) after 120s of content silence instead of hanging forever. Opt out with stream_chunk_timeout=None or LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S=0.

Kernel-level TCP keepalive / TCP_USER_TIMEOUT are applied via a custom httpx transport. httpx disables its env-proxy auto-detection (HTTP_PROXY / HTTPS_PROXY / ALL_PROXY / NO_PROXY and macOS/Windows system proxy) whenever a transport is supplied, so to avoid silently breaking enterprise proxy users, ChatOpenAI now detects the "proxy-env-shadow" shape at construction and skips the custom transport entirely when all of these hold:

  • http_socket_options left at default (None)
  • No http_client or http_async_client supplied
  • No openai_proxy supplied
  • A proxy env var / system proxy is visible to httpx

On that shape the instance falls back to pre-PR behavior and env-proxy auto-detection still applies. A one-time INFO records the bypass.

Users who explicitly set http_socket_options=[...] alongside an env proxy still get the shadowed behavior with a one-time WARNING log — they opted in. Full opt-outs below.


Streaming chat completions can hang forever when the underlying TCP connection silently dies mid-stream (idle NAT/LB timeouts, sandboxed runtimes killing long-lived connections, peer gone without a FIN or RST). httpx's read timeout doesn't help here because it's reset by any bytes arriving on the socket, including OpenAI's SSE keepalive comments, so a stream that's quiet on content but still producing keepalives looks alive forever.

This PR adds two knobs to ChatOpenAI, both on by default with opt-outs:

  • stream_chunk_timeout (default 120s): wraps the async streaming iterator in asyncio.wait_for per chunk. Measures the gap between parsed SSE chunks, so keepalives don't reset it. Fires on genuine content silence and raises StreamChunkTimeoutError — a TimeoutError subclass carrying timeout_s, model_name, and chunks_received as structured attributes (mirrored in the WARNING log's extra=) for alerting without message-regex. Override with the kwarg or LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S.
  • http_socket_options: applies SO_KEEPALIVE + TCP_KEEPIDLE / TCP_KEEPINTVL / TCP_KEEPCNT + TCP_USER_TIMEOUT on Linux (macOS equivalents where available). On platforms missing some options, they're dropped silently and the remaining set still does useful work.

Pool limits are set explicitly on the custom transport to mirror the openai SDK — without that, passing transport= to httpx.AsyncClient silently shrinks the connection pool.

Behavior change

The default-shape proxy-env bypass (above) covers the common enterprise case. Beyond that:

  • Connections that would previously have hung forever will now error out via StreamChunkTimeoutError.
  • Users who explicitly opt into http_socket_options while also relying on env proxies will see a one-time WARNING and lose env-proxy auto-detection — the custom transport shadows it. This is the original shipped behavior, retained for anyone who wants socket tuning on top of an env-proxied setup.

Full opt-outs:

  • stream_chunk_timeout=None or LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S=0
  • http_socket_options=() or LANGCHAIN_OPENAI_TCP_KEEPALIVE=0
  • Supply your own http_client and http_async_client. http_socket_options is applied per side: passing only one still leaves the other side's default builder getting socket options. Supply both (or combine with http_socket_options=()) to take full control.

Unparseable or negative values for the LANGCHAIN_OPENAI_* env vars fall back to the default with a WARNING log rather than silently being accepted, so a misconfigured environment still boots but the fallback is discoverable.

@github-actions github-actions Bot added feature For PRs that implement a new feature; NOT A FEATURE REQUEST integration PR made that is related to a provider partner package integration internal openai `langchain-openai` package issues & PRs size: XL 1000+ LOC labels Apr 22, 2026
@github-actions github-actions Bot added size: L 500-999 LOC and removed size: XL 1000+ LOC labels Apr 22, 2026
Comment thread libs/partners/openai/langchain_openai/chat_models/_client_utils.py Outdated
@github-actions github-actions Bot added size: XL 1000+ LOC and removed size: L 500-999 LOC labels Apr 22, 2026
@mdrxy
Copy link
Copy Markdown
Member

@mdrxy Mason Daugherty (mdrxy) merged commit 4000c22 into master Apr 23, 2026
90 checks passed
@mdrxy Mason Daugherty (mdrxy) deleted the david/04-22/openai-streaming-reliability branch April 23, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature For PRs that implement a new feature; NOT A FEATURE REQUEST integration PR made that is related to a provider partner package integration internal openai `langchain-openai` package issues & PRs size: XL 1000+ LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants