Skip to content

feat(observability): lifecycle markers + cancel-leak fix#316

Merged
formatBCE merged 1 commit intodevfrom
feat/lifecycle-observability
May 2, 2026
Merged

feat(observability): lifecycle markers + cancel-leak fix#316
formatBCE merged 1 commit intodevfrom
feat/lifecycle-observability

Conversation

@teancom
Copy link
Copy Markdown
Contributor

@teancom teancom commented May 2, 2026

I added a bunch of logging to track down errors over the past few days. These are the ones that I thought were worth shipping to help with issue triaging in the future, along with fixing a small leak in sendRequest.

Robot:

Adds diagnostic logging across KtorServiceClient and AuthenticationManager to make stuck-reconnect / stuck-CarPlay sessions traceable from log captures.

  • stateLabel / dcsLabel for compact, grep-stable session-state tags
  • Transport state-edge logs (Connected / Reconnecting / Failed / Disconnected)
  • isReadyForCommands edge collector
  • Explicit "no recovery taken" branch in onAppForeground for the actually-stuck case (state != Connected, no savedInfo); happy path stays silent so logs aren't polluted with non-events
  • "Reconnect not applicable" branch in onExternalConsumerActive
  • sendRequest[msgId] markers: start/sent/resumed at .d (high volume), cancelled/transport=null at .i (rare, informational)
  • AuthMgr per-branch trace via Logger.withTag("AuthMgr")

Bug fix: sendRequest now installs invokeOnCancellation to remove the rpcEngine callback if the parent coroutine is cancelled (e.g. a caller's withTimeoutOrNull fires). Without this, never-arriving responses leak orphaned callbacks for the lifetime of the session.

Adds diagnostic logging across KtorServiceClient and AuthenticationManager
to make stuck-reconnect / stuck-CarPlay sessions traceable from log
captures.

- stateLabel / dcsLabel for compact, grep-stable session-state tags
- Transport state-edge logs (Connected / Reconnecting / Failed / Disconnected)
- isReadyForCommands edge collector
- Explicit "no recovery taken" branch in onAppForeground for the
  actually-stuck case (state != Connected, no savedInfo); happy path
  stays silent so logs aren't polluted with non-events
- "Reconnect not applicable" branch in onExternalConsumerActive
- sendRequest[msgId] markers: start/sent/resumed at .d (high volume),
  cancelled/transport=null at .i (rare, informational)
- AuthMgr per-branch trace via Logger.withTag("AuthMgr")

Bug fix: sendRequest now installs invokeOnCancellation to remove the
rpcEngine callback if the parent coroutine is cancelled (e.g. a caller's
withTimeoutOrNull fires). Without this, never-arriving responses leak
orphaned callbacks for the lifetime of the session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teancom teancom self-assigned this May 2, 2026
@teancom teancom added the bug Something isn't working label May 2, 2026
Copy link
Copy Markdown
Collaborator

@formatBCE formatBCE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - let's see how it runs ;)

@formatBCE formatBCE merged commit 85452a4 into dev May 2, 2026
3 checks passed
@formatBCE formatBCE deleted the feat/lifecycle-observability branch May 2, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants