fix(dlt-receive): set SO_RCVTIMEO and SO_KEEPALIVE to enable -r reconnect on half-open TCP#845
Conversation
…nect on half-open TCP Fixes COVESA#817. With half-open TCP (VM/container restart, network partition), recv() blocks forever and the -r reconnect interval is never consulted. Fix: add recv_timeout_sec field to DltClient (default 0 = no change). In dlt_client_connect(): if recv_timeout_sec > 0, set SO_RCVTIMEO; always set SO_KEEPALIVE + TCP_KEEPIDLE/INTVL/CNT where available. In dlt-receive.c: wire recv_timeout_sec = rvalue/1000 (min 5s) before reconnect loop when -r is active. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Akihiko Komada <aki1770@gmail.com>
|
Hi @aki1770-del |
|
Thanks @minminlittleshrimp — regression evidence below. It's from a clean local build on x64 Linux (CMake with the CI flags); CI on the PR is still pending, so this is the local run, not a green-CI signal. Build: clean — Existing test suite (regression): dlt-receive Control clients / backwards-compat (the now-unconditional dlt-viewer: it doesn't link Caveats: local run, not CI-green yet; the half-open recovery is a scripted simulation + On the |
fix(dlt-receive): set SO_RCVTIMEO and SO_KEEPALIVE in dlt_client_connect to enable -r reconnect on half-open TCP
Fixes #817.
Problem
dlt-receive -r <interval>is supposed to reconnect automatically when theDLT daemon becomes unavailable. This works correctly for clean disconnects
(RST/FIN received):
dlt_receiver_receive()returns ≤0,dlt_client_main_loopexits, and the reconnect loop fires after the specified interval.
However, when the network path fails silently — half-open TCP, common in
VM/container restarts and network partitions — the kernel socket stays in
ESTABLISHEDstate.dlt_receiver_receive()callsrecv()which blocksforever. The reconnect interval is never consulted.
Fix
Three files changed:
include/dlt/dlt_client.h: Addrecv_timeout_secfield toDltClientstruct (0 = no timeout, backwards-compatible default).
src/lib/dlt_client.c— indlt_client_connect(), after blocking modeis restored on a successful TCP connect:
client->recv_timeout_sec > 0: callsetsockopt(SO_RCVTIMEO)with theconfigured timeout. When
recv()times out it returns -1/EAGAIN, whichdlt_receiver_receive()maps to a ≤0 return, causingdlt_client_main_loopto exit and the reconnect loop to fire.
SO_KEEPALIVE(+TCP_KEEPIDLE/TCP_KEEPINTVL/TCP_KEEPCNTwhere available) as a second line of defence.
src/console/dlt-receive.c— before the reconnect loop: when-risactive, set
dltclient.recv_timeout_sec = rvalue / 1000(minimum 5 s) sothe socket timeout matches the reconnect interval.
Behaviour
-rnot set)recv()blocks foreverrecv()blocks forever (no change — timeout = 0)-r 5000= 5 s)recv()blocks forever,-rignoredrecv()times out after 5 s, reconnect firesAI-assisted — authored with Claude, reviewed by Komada.