Commit 42a21f2
committed
[Darwin] Coalesce mDNS resolve cancel/restart so in-flight results aren't dropped
User-visible failure: on Darwin, every reconnect to a Matter node after
ChipDnssdResolveNoLongerNeeded shows ~1s of extra NodeID-resolve latency.
Inbound mDNS resolve answers already queued on the dnssd socket get discarded,
so the next resolve has to start from scratch.
Root cause: when the consumer counter drops to zero we immediately call
Finalize -> DNSServiceRefDeallocate. Per the dnssd contract,
DNSServiceRefDeallocate discards any events queued on that connection but not
yet read. A second observation from the mDNS owner is that "starting and
stopping queries doesn't query harder" -- a tight cancel-then-restart for the
same instance name is strictly worse than letting the existing query run.
Fix: introduce a per-ResolveContext deferred-teardown window (default 500ms)
before the actual DNSServiceRefDeallocate. Inside the window: a queued read
indicator dispatches the result through DispatchSuccess (which cancels the
timer); a new ChipDnssdResolve for the same instance name reuses the existing
context and bumps the counter back to 1, skipping DNSServiceCreateConnection /
DNSServiceResolve entirely; otherwise the timer fires
OnResolveDeferredTeardown -> Finalize(CHIP_ERROR_CANCELLED), preserving the
existing failure-path contract upper layers rely on.
Carve-out: delegate-based ResolveContexts (callback == nullptr, used by
MTRCommissionableBrowser) are NOT subject to deferred teardown. The browser
churns OnBrowseAdd/OnBrowseRemove for the same instance name on the order of
microseconds while a device is being discovered; holding the underlying
DNSServiceRef alive across that churn starves DNSServiceGetAddrInfo of a
chance to deliver before the next remove arrives, which manifested as
MTRCommissionableBrowserTests/test005 timing out under TSAN. The NodeID
reconnect bug this PR fixes is on the callback-based path.
Blast radius is confined to the Darwin dnssd platform layer.
Tests in src/platform/tests/TestDnssd.cpp pin:
- ReusesContextWithinDeferredWindow (callback-based coalescing)
- DelegateBasedResolveIsNotDeferred (delegate-based synchronous teardown)
- CancelStillPropagatesIfNoInFlightResult (timer fires once if no follow-up)
- Multi-sibling rescue, scope mismatch refusal, mismatched-callback rebind,
shared-counter invariants, repeated toggle within window, etc.
rdar://1762638761 parent 8a162c6 commit 42a21f2
4 files changed
Lines changed: 1839 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
525 | 525 | | |
526 | 526 | | |
527 | 527 | | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
528 | 531 | | |
529 | 532 | | |
530 | 533 | | |
531 | 534 | | |
532 | 535 | | |
533 | 536 | | |
534 | | - | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
535 | 549 | | |
536 | 550 | | |
537 | 551 | | |
| |||
546 | 560 | | |
547 | 561 | | |
548 | 562 | | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
549 | 566 | | |
550 | 567 | | |
551 | 568 | | |
| |||
0 commit comments