Fix config entry FAILED_UNLOAD on hung BLE disconnect, add disconnect tracing#359
Merged
Merged
Conversation
ac95f0d to
52bc7de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BleakClient.disconnect()can block until the connect timeout (default 20s) when a write-with-response is still pending on the transport after a mid-auth BLE drop, which is common through an ESPHome proxy.Connection.disconnect()awaited it with no deadline, soasync_unload_entrycould outrun HA's unload window and leave the entry inConfigEntryState.FAILED_UNLOAD- which only a full restart recovers from, re-triggering the same flap.Every client teardown now goes through a bounded
_disconnect_client()helper (asyncio.timeout,DISCONNECT_TIMEOUT = 5s) that swallows the usual "already down" errors and, on timeout, gives up waiting and lets the caller finish local cleanup - the transport drains on its own once the drop is detected. This enforces the deadline in the library where the hang actually happens, so all five teardown paths benefit and noasync_unload_entryguard is needed.While here, make disconnects traceable from a diagnostics download alone (no debug logging required), since the original report was hard to pin down:
disconnect()and the bleakdisconnected()callback record the caller chain as thereasonon theDISCONNECTING/DISCONNECTEDentries inconnection_state_history, so a requested disconnect (unload, reload) is distinguishable from an unsolicited bleak drop._disconnect_client()records each outcome (ok/timeout/already_down) with its trigger into a boundeddisconnect_logthat is included in the diagnostics dump, so timeouts and which path caused them are visible across every teardown path.Resolves #338. Reimplements #339 with the fix moved into the library and the root cause framed correctly.
Should also address #366 #367