Skip to content

Fix config entry FAILED_UNLOAD on hung BLE disconnect, add disconnect tracing#359

Merged
rabits merged 2 commits into
rabits:mainfrom
GnoX:fix/bound-disconnect-timeout
Jun 5, 2026
Merged

Fix config entry FAILED_UNLOAD on hung BLE disconnect, add disconnect tracing#359
rabits merged 2 commits into
rabits:mainfrom
GnoX:fix/bound-disconnect-timeout

Conversation

@GnoX
Copy link
Copy Markdown
Collaborator

@GnoX GnoX commented May 29, 2026

BleakClient.disconnect() can block until the connect timeout (default 20s) when a write-with-response is still pending on the transport after a mid-auth BLE drop, which is common through an ESPHome proxy. Connection.disconnect() awaited it with no deadline, so async_unload_entry could outrun HA's unload window and leave the entry in ConfigEntryState.FAILED_UNLOAD - which only a full restart recovers from, re-triggering the same flap.

Every client teardown now goes through a bounded _disconnect_client() helper (asyncio.timeout, DISCONNECT_TIMEOUT = 5s) that swallows the usual "already down" errors and, on timeout, gives up waiting and lets the caller finish local cleanup - the transport drains on its own once the drop is detected. This enforces the deadline in the library where the hang actually happens, so all five teardown paths benefit and no async_unload_entry guard is needed.

While here, make disconnects traceable from a diagnostics download alone (no debug logging required), since the original report was hard to pin down:

  • disconnect() and the bleak disconnected() callback record the caller chain as the reason on the DISCONNECTING / DISCONNECTED entries in connection_state_history, so a requested disconnect (unload, reload) is distinguishable from an unsolicited bleak drop.
  • _disconnect_client() records each outcome (ok / timeout / already_down) with its trigger into a bounded disconnect_log that is included in the diagnostics dump, so timeouts and which path caused them are visible across every teardown path.

Resolves #338. Reimplements #339 with the fix moved into the library and the root cause framed correctly.
Should also address #366 #367

@GnoX GnoX self-assigned this May 29, 2026
@GnoX GnoX added bug Something isn't working enhancement New feature or request labels May 29, 2026
@GnoX GnoX changed the title Bound BLE client disconnect and trace disconnect triggers Fix config entry FAILED_UNLOAD on hung BLE disconnect, add disconnect tracing May 29, 2026
@GnoX GnoX force-pushed the fix/bound-disconnect-timeout branch from ac95f0d to 52bc7de Compare May 29, 2026 21:59
@GnoX GnoX marked this pull request as ready for review June 5, 2026 16:47
@GnoX GnoX requested a review from rabits June 5, 2026 16:47
Copy link
Copy Markdown
Owner

@rabits rabits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks great

@rabits rabits merged commit 5dee961 into rabits:main Jun 5, 2026
4 checks passed
@GnoX GnoX deleted the fix/bound-disconnect-timeout branch June 5, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: SHP2/DPU enters ConfigEntryState.FAILED_UNLOAD during BLE flap — async_unload_entry blocks 20 s on BleakClient.disconnect()

2 participants