Skip to content

feat(anthropic): prevent silent streaming hangs in ChatAnthropic#36964

Closed
Mason Daugherty (mdrxy) wants to merge 2 commits intomasterfrom
mdrxy/anthropic-hangs
Closed

feat(anthropic): prevent silent streaming hangs in ChatAnthropic#36964
Mason Daugherty (mdrxy) wants to merge 2 commits intomasterfrom
mdrxy/anthropic-hangs

Conversation

@mdrxy
Copy link
Copy Markdown
Member

Port of #36949 from langchain-openai to langchain-anthropic. Streaming ChatAnthropic calls could hang forever when the underlying TCP connection silently died mid-stream (idle NAT/LB timeouts, sandboxed runtimes killing long-lived connections, peer gone without a FIN or RST). httpx's read timeout doesn't help because it resets on any bytes arriving on the socket, including the Messages API's event: ping SSE heartbeats — a stream that's silent on content but still pinging looks alive forever. This adds two bounded-hang knobs with safe defaults.

Changes

  • Add stream_chunk_timeout to ChatAnthropic (default 120s, None/0 to disable, env override LANGCHAIN_ANTHROPIC_STREAM_CHUNK_TIMEOUT_S). Wraps the async SDK stream iterator in asyncio.wait_for per chunk. Measures the gap between parsed SSE chunks, so the anthropic SDK's ping-filter keeps heartbeats from resetting the timer. Sync stream() is untouched.
  • Add StreamChunkTimeoutError — subclass of asyncio.TimeoutError and TimeoutError on both Python 3.10 and 3.11+ via a dynamic base-class tuple, so existing except TimeoutError: handlers still catch it. Carries structured attributes timeout_s, model_name, chunks_received mirrored in a WARNING log's extra= for alerting without message-regex. Exported from the top-level package.
  • Add http_socket_options to ChatAnthropic. Defaults to SO_KEEPALIVE + TCP_KEEPIDLE / TCP_KEEPINTVL / TCP_KEEPCNT + TCP_USER_TIMEOUT on Linux (macOS equivalents where available, Windows falls back to SO_KEEPALIVE only). Unsupported options are probed against a throwaway socket and silently dropped so the default set is non-fatal across platforms. Env overrides LANGCHAIN_ANTHROPIC_TCP_KEEPALIVE (kill-switch), LANGCHAIN_ANTHROPIC_TCP_KEEPIDLE, LANGCHAIN_ANTHROPIC_TCP_KEEPINTVL, LANGCHAIN_ANTHROPIC_TCP_KEEPCNT, LANGCHAIN_ANTHROPIC_TCP_USER_TIMEOUT_MS. Unparseable or negative env values fall back with a discoverable WARNING rather than silently applying a surprising default.
  • Enterprise-proxy-safe default shape. A custom httpx transport disables httpx's env-proxy auto-detection. To avoid silently breaking users relying on HTTP_PROXY / HTTPS_PROXY / ALL_PROXY / system proxies, ChatAnthropic detects the "proxy-env-shadow" pattern and skips the custom transport entirely when http_socket_options is at default (None), no anthropic_proxy is supplied, and a proxy env var / system proxy is visible to httpx. A one-time INFO records the bypass. Users who explicitly set http_socket_options=[...] alongside an env proxy get the original shadowing behavior with a one-time WARNING.
  • Cached _resolved_socket_options property threads the resolved option tuple into both _client and _async_client so the bypass / shadow warnings fire once per instance, not once per client build. When anthropic_proxy and socket options are combined, proxy is wrapped in httpx.Proxy(...) on the transport and the Client-level proxy= key is popped to avoid httpx's double-configuration error.
  • Pool limits set explicitly on the custom transport (httpx.Limits(max_connections=1000, max_keepalive_connections=100, keepalive_expiry=5.0)) — matches anthropic.DEFAULT_CONNECTION_LIMITS. Without this, passing transport= to httpx.AsyncClient silently shrinks the connection pool.
  • Negative constructor values for stream_chunk_timeout (e.g., hydrated from YAML/JSON configs) fall back to the default with a WARNING via a pydantic field_validator, rather than silently treating negatives as an opt-out. None and 0 remain the documented off-switches.

Behavior change

  • Async streaming calls that would previously have hung forever now raise StreamChunkTimeoutError after 120s of content silence. Existing except TimeoutError: / except asyncio.TimeoutError: handlers catch it unchanged.
  • Opt-outs: stream_chunk_timeout=None (or …_TIMEOUT_S=0), http_socket_options=() (or …_TCP_KEEPALIVE=0).

@github-actions github-actions Bot added anthropic `langchain-anthropic` package issues & PRs feature For PRs that implement a new feature; NOT A FEATURE REQUEST integration PR made that is related to a provider partner package integration internal size: XL 1000+ LOC labels Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

anthropic `langchain-anthropic` package issues & PRs feature For PRs that implement a new feature; NOT A FEATURE REQUEST integration PR made that is related to a provider partner package integration internal size: XL 1000+ LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant