You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[MTRCommissioning] Add watchdog for client failure to advance past PASE
If a client (Home app, HomeSmartMatter appex, etc.) implements
commissioning:paseSessionEstablishmentComplete: but then drops the flow
without ever calling commissionNodeWithID: or cancelCommissioningForNodeID:,
the MTRCommissioningOperation sits in isWaitingAfterPASEEstablished = YES
forever. MTRDeviceController_Concrete keeps the operation alive via
currentInternalCommissioning, so every subsequent commissioning attempt hits
the busy check in startWithController and fails with CHIP_ERROR_BUSY (0xDB).
The user's symptom is "Matter accessories refuse to connect" indefinitely
until the daemon (or app) is restarted.
Add a 5-minute dispatch_source_t watchdog armed when we set
isWaitingAfterPASEEstablished = YES and cancelled when:
- the property is set back to NO (via the new custom setter that fires
whenever commissionNodeWithID: succeeds on the controller),
- the operation is stopped,
- any terminal _dispatchCommissioningError path runs,
- or the operation deallocates.
On expiry, the watchdog stops the in-flight commissioning via
stopCommissioning:forCommissioningID: (which routes through StopPairing in
both the legacy isInternallyCreated:YES flow AND the public-API flow) and
routes a CHIP_ERROR_TIMEOUT through the standard _dispatchCommissioningError
path so client state can settle back to "no commissioning in flight". Five
minutes is comfortably longer than the commissionee fail-safe (60-180
seconds) but short enough that the user is unblocked within a single
sit-down troubleshooting session.
Also add a defensive secondary patch in
setupCommissioningSessionWithPayload:newNodeID:error: -- if a stale
internally-created commissioning is still parked at the post-PASE waiting
state with a *different* setup payload, treat the new call as an implicit
cancel of the previous one. Without this, the new call would have failed
with CHIP_ERROR_BUSY because the controller still considered the prior
commissioning in flight. The implicit cancel propagates any StopPairing
failure back to the caller rather than silently starting a new
commissioning that is about to hit the same busy state. The implicit
cancel needs the prior operation's commissioningID, so a small read-only
commissioningID accessor is added to MTRCommissioningOperation_Internal.h.
Threading:
- The isWaitingAfterPASEEstablished setter writes the flag synchronously
under an os_unfair_lock so a caller that immediately reads it back
observes the updated value, then bounces only the watchdog teardown
side effect onto _delegateQueue (which is the queue that owns the
dispatch_source_t timer).
- The watchdog event_handler consults isWaitingAfterPASEEstablished as
a late-fire guard: if the client has legitimately advanced past the
post-PASE waiting state but a timer block was already posted to the
queue ahead of the cancel, the handler swallows the late fire instead
of routing a spurious CHIP_ERROR_TIMEOUT into a now-active flow.
- -stop and _dispatchCommissioningError clear
isWaitingAfterPASEEstablished synchronously before bouncing the
watchdog cancel, so a timer event already enqueued on _delegateQueue
is suppressed by the late-fire guard.
- The fire path uses stopCommissioning:forCommissioningID: rather than
cancelCommissioningForNodeID: so StopPairing runs on the CHIP layer
in both the isInternallyCreated:YES and isInternallyCreated:NO
flows, and checks the BOOL return so that a watchdog firing against
a commissioning that has already been replaced does not surface a
spurious timeout into the successor.
- If dispatch_source_create fails when arming the watchdog, the
establishment-done path now routes a CHIP_ERROR_NO_MEMORY through
the standard failure path rather than entering the unbounded wait
state silently.
rdar://171083184
0 commit comments