Skip to content

fix(dlt-receive): set SO_RCVTIMEO and SO_KEEPALIVE to enable -r reconnect on half-open TCP#845

Open
aki1770-del wants to merge 1 commit into
COVESA:masterfrom
aki1770-del:fix/dlt-client-reconnect-halfopen-817
Open

fix(dlt-receive): set SO_RCVTIMEO and SO_KEEPALIVE to enable -r reconnect on half-open TCP#845
aki1770-del wants to merge 1 commit into
COVESA:masterfrom
aki1770-del:fix/dlt-client-reconnect-halfopen-817

Conversation

@aki1770-del
Copy link
Copy Markdown

fix(dlt-receive): set SO_RCVTIMEO and SO_KEEPALIVE in dlt_client_connect to enable -r reconnect on half-open TCP

Fixes #817.

Problem

dlt-receive -r <interval> is supposed to reconnect automatically when the
DLT daemon becomes unavailable. This works correctly for clean disconnects
(RST/FIN received): dlt_receiver_receive() returns ≤0, dlt_client_main_loop
exits, and the reconnect loop fires after the specified interval.

However, when the network path fails silently — half-open TCP, common in
VM/container restarts and network partitions — the kernel socket stays in
ESTABLISHED state. dlt_receiver_receive() calls recv() which blocks
forever
. The reconnect interval is never consulted.

Fix

Three files changed:

include/dlt/dlt_client.h: Add recv_timeout_sec field to DltClient
struct (0 = no timeout, backwards-compatible default).

src/lib/dlt_client.c — in dlt_client_connect(), after blocking mode
is restored on a successful TCP connect:

  • If client->recv_timeout_sec > 0: call setsockopt(SO_RCVTIMEO) with the
    configured timeout. When recv() times out it returns -1/EAGAIN, which
    dlt_receiver_receive() maps to a ≤0 return, causing dlt_client_main_loop
    to exit and the reconnect loop to fire.
  • Always set SO_KEEPALIVE (+ TCP_KEEPIDLE/TCP_KEEPINTVL/TCP_KEEPCNT
    where available) as a second line of defence.

src/console/dlt-receive.c — before the reconnect loop: when -r is
active, set dltclient.recv_timeout_sec = rvalue / 1000 (minimum 5 s) so
the socket timeout matches the reconnect interval.

Behaviour

Scenario Before After
Clean disconnect (RST/FIN) Reconnect works Unchanged
Half-open TCP (-r not set) recv() blocks forever recv() blocks forever (no change — timeout = 0)
Half-open TCP (-r 5000 = 5 s) recv() blocks forever, -r ignored recv() times out after 5 s, reconnect fires

AI-assisted — authored with Claude, reviewed by Komada.

…nect on half-open TCP

Fixes COVESA#817. With half-open TCP (VM/container restart, network partition),
recv() blocks forever and the -r reconnect interval is never consulted.

Fix: add recv_timeout_sec field to DltClient (default 0 = no change).
In dlt_client_connect(): if recv_timeout_sec > 0, set SO_RCVTIMEO;
always set SO_KEEPALIVE + TCP_KEEPIDLE/INTVL/CNT where available.
In dlt-receive.c: wire recv_timeout_sec = rvalue/1000 (min 5s) before
reconnect loop when -r is active.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Akihiko Komada <aki1770@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] dlt-receive -r reconnect does not work when server connection hangs (no socket timeout / keepalive)

1 participant