You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the infra-bemisc deployment, client connections from omni-gateway (Deno) to ldj.frontdoorhd.com (behind haproxy + netius proxy_c) are being killed every 5-10 minutes with no prior notice, despite KEEPALIVE_TIMEOUT=3600. HAProxy's idle timeout is set to 2 hours, ruling it out as the cause. Diagnosing this is difficult because the DIAG HTTP server only exposes currently active connections — once a connection closes, all context is lost. Log-based diagnostics have proven inefficient due to high traffic volume. We need a way to inspect recently closed connections and their close reasons via the DIAG HTTP endpoint to identify the root cause of these disconnections.
Description (What)
Add a ring buffer of recently closed connections to the DIAG system, capturing close reason, timestamps, duration, last activity time, error details, and paired connection ID (for proxy correlation). Expose this via a new GET /connections/closed endpoint on DiagApp. Close reasons will be string constants (e.g., "timeout", "client_eof", "upstream_error", "error", "explicit"). The ring buffer defaults to 512 entries, configurable via DIAG_CLOSED_MAX. Tracking is active when running under DIAG mode. The endpoint returns the full buffer, most recent first.
Implementation (How)
1. Define close reason constants in src/netius/base/conn.py — add string constants: "timeout", "client_eof", "upstream_error", "error", "explicit", "unknown" (and others as needed)
2. Add close_reason field to BaseConnection — initialize to None, set before close() is called; include close_reason, close_timestamp, and last_activity_timestamp in info_dict() output
3. Implement a ring buffer for closed connections in src/netius/base/common.py (or a new utility) — use collections.deque(maxlen=N) with max size from DIAG_CLOSED_MAX conf (default 512); store a snapshot dict of the connection's info_dict() plus close metadata at close time
4. Hook into Base.on_connection_d() — when DIAG is active, capture the closed connection's info dict (including close reason, close timestamp, connection duration, last activity timestamp, error details) and append to the ring buffer
5. Propagate close reasons at all close call sites — audit BaseConnection.close(), timeout handlers, EOF/error handlers in src/netius/base/common.py, and ensure each sets close_reason before closing
6. Propagate close reasons in proxy server — in src/netius/servers/proxy.py, set appropriate close reasons in _on_prx_close() (upstream error), _on_raw_close() (tunnel close), on_connection_d(), on_stream_d(); include paired/correlated connection ID in the close metadata
7. Add paired connection ID to proxy close records — when a proxy frontend or backend connection closes, include the paired connection's ID (from conn_map) in the close snapshot so frontend/backend closures can be correlated
8. Add GET /connections/closed endpoint to DiagApp in src/netius/base/diag.py — return the full ring buffer contents as JSON, most recent first
9. Add DIAG_CLOSED_MAX conf support — read from netius conf system, default to 512, used to size the deque
10. Test — add tests for the ring buffer behavior (overflow, ordering), close reason propagation, and the new DIAG endpoint
Problem (Why)
In the infra-bemisc deployment, client connections from omni-gateway (Deno) to
ldj.frontdoorhd.com(behind haproxy + netius proxy_c) are being killed every 5-10 minutes with no prior notice, despiteKEEPALIVE_TIMEOUT=3600. HAProxy's idle timeout is set to 2 hours, ruling it out as the cause. Diagnosing this is difficult because the DIAG HTTP server only exposes currently active connections — once a connection closes, all context is lost. Log-based diagnostics have proven inefficient due to high traffic volume. We need a way to inspect recently closed connections and their close reasons via the DIAG HTTP endpoint to identify the root cause of these disconnections.Description (What)
Add a ring buffer of recently closed connections to the DIAG system, capturing close reason, timestamps, duration, last activity time, error details, and paired connection ID (for proxy correlation). Expose this via a new
GET /connections/closedendpoint on DiagApp. Close reasons will be string constants (e.g.,"timeout","client_eof","upstream_error","error","explicit"). The ring buffer defaults to 512 entries, configurable viaDIAG_CLOSED_MAX. Tracking is active when running under DIAG mode. The endpoint returns the full buffer, most recent first.Implementation (How)
src/netius/base/conn.py— add string constants:"timeout","client_eof","upstream_error","error","explicit","unknown"(and others as needed)close_reasonfield toBaseConnection— initialize toNone, set beforeclose()is called; includeclose_reason,close_timestamp, andlast_activity_timestampininfo_dict()outputsrc/netius/base/common.py(or a new utility) — usecollections.deque(maxlen=N)with max size fromDIAG_CLOSED_MAXconf (default 512); store a snapshot dict of the connection'sinfo_dict()plus close metadata at close timeBase.on_connection_d()— when DIAG is active, capture the closed connection's info dict (including close reason, close timestamp, connection duration, last activity timestamp, error details) and append to the ring bufferBaseConnection.close(), timeout handlers, EOF/error handlers insrc/netius/base/common.py, and ensure each setsclose_reasonbefore closingsrc/netius/servers/proxy.py, set appropriate close reasons in_on_prx_close()(upstream error),_on_raw_close()(tunnel close),on_connection_d(),on_stream_d(); include paired/correlated connection ID in the close metadataconn_map) in the close snapshot so frontend/backend closures can be correlatedGET /connections/closedendpoint to DiagApp insrc/netius/base/diag.py— return the full ring buffer contents as JSON, most recent firstDIAG_CLOSED_MAXconf support — read from netius conf system, default to 512, used to size the deque