Skip to content

fix(signalk): add mDNS settle delay and backoff to fix Ethernet race#851

Open
dirkwa wants to merge 1 commit intoSignalK:mainfrom
dirkwa:fix/mdns-ethernet-race
Open

fix(signalk): add mDNS settle delay and backoff to fix Ethernet race#851
dirkwa wants to merge 1 commit intoSignalK:mainfrom
dirkwa:fix/mdns-ethernet-race

Conversation

@dirkwa
Copy link
Contributor

@dirkwa dirkwa commented Mar 11, 2026

On Ethernet, DHCP completes faster than on WiFi, which means the first MDNS.queryService() call fires before the mDNS stack has joined its multicast groups on the new interface. This causes the server discovery to silently return 0 results even when the Signal K server is present, and the 2-second flat retry loop spins indefinitely without connecting.

Two-part fix:

  1. Settle delay: subscribe to ARDUINO_EVENT_ETH_GOT_IP and ARDUINO_EVENT_WIFI_STA_GOT_IP. On each event, reset mdns_ready_ and arm a 4-second one-shot timer via event_loop()->onDelay(). mDNS queries are blocked until the timer fires.

  2. Exponential backoff: after a failed mDNS query, double the retry interval (starting at 5s, capped at 60s) instead of hammering queryService() every 2 seconds.

Both the settle timer and the backoff counter are reset when the network gets a new IP (e.g. after a cable reconnect), so recovery is still fast.

On Ethernet, DHCP completes faster than on WiFi, which means the first
MDNS.queryService() call fires before the mDNS stack has joined its
multicast groups on the new interface. This causes the server discovery
to silently return 0 results even when the Signal K server is present,
and the 2-second flat retry loop spins indefinitely without connecting.

Two-part fix:

1. Settle delay: subscribe to ARDUINO_EVENT_ETH_GOT_IP and
   ARDUINO_EVENT_WIFI_STA_GOT_IP. On each event, reset mdns_ready_ and
   arm a 4-second one-shot timer via event_loop()->onDelay(). mDNS
   queries are blocked until the timer fires.

2. Exponential backoff: after a failed mDNS query, double the retry
   interval (starting at 5s, capped at 60s) instead of hammering
   queryService() every 2 seconds.

Both the settle timer and the backoff counter are reset when the network
gets a new IP (e.g. after a cable reconnect), so recovery is still fast.
@dirkwa dirkwa force-pushed the fix/mdns-ethernet-race branch from 99ff1d0 to bf92d1c Compare March 11, 2026 17:00
@mairas
Copy link
Collaborator

mairas commented Mar 14, 2026

Claude-generated review, curated:

Review Summary

The core finding: the 4-second hard-coded settle delay is both unnecessary and harmful.

The key argument against kMdnsSettleMs

The PR already adds exponential backoff (5s → 10s → … → 60s). If the first mDNS query fires before the multicast group join completes and returns 0 results, the backoff simply retries at 5s — well after mDNS is ready. The settle delay adds 4 seconds of unconditional latency to every boot on every platform, including WiFi where the race doesn't even manifest, to avoid a single harmless failed query that the backoff already handles.

A simpler and better fix: just the backoff, with a lower initial interval (e.g., 2s instead of 5s) for faster recovery.

Code quality

  1. DRY violation: The two Network.onEvent lambdas are identical 15-line blocks. Should be one lambda registered for both event types.

Recommended alternative

Drop the settle delay entirely. Keep only the exponential backoff with a 2-second initial interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants