Skip to content

fix(signaling): cancel in-flight connect on disconnect#1093

Merged
SERDUN merged 5 commits into
developfrom
fix/signaling-disconnect-race-condition
Apr 10, 2026
Merged

fix(signaling): cancel in-flight connect on disconnect#1093
SERDUN merged 5 commits into
developfrom
fix/signaling-disconnect-race-condition

Conversation

@SERDUN

@SERDUN SERDUN commented Apr 10, 2026

Copy link
Copy Markdown
Member

Problem

SignalingModuleImpl.disconnect() was a no-op when called while a connect() was in-flight (WebSocket handshake pending). This left the module in a state where the next connect() call was silently dropped.

Root cause

disconnect() checked only _client to decide whether there was anything to do:

Future<void> disconnect() async {
  final client = _client;
  if (client == null) return;  // ← no-op when handshake is in-flight
  ...
}

Between connect() starting and _clientFactory(...) completing, _client is null while a connect is actively in progress. If disconnect() is called in this window, it returns without stopping the in-flight async task. The subsequent connect() is then silently dropped because the module still considers itself busy.

Failure chain (observed on Pixel 9, Android 16)

18:42:18  connect() → starts _connectAsync, awaiting WebSocket handshake
18:42:19  notifyAppPaused → disconnect() → _client==null → return (no-op)
           connect is still in-flight
18:42:21  notifyAppResumed → connect() → module thinks it's busy → dropped
18:42:33  createCall: routing state not available after 10s

The lifecycle bounce (active → paused → resumed) is triggered on Android 16 when the user taps the call button — Android Telecom API fires the transition automatically. The pause hits exactly while the WebSocket handshake is pending.


Fix

Replaced the two-field approach (bool _connecting + connect-in-progress tracking) with a single Object? _connectToken field — the "latest wins" pattern for non-cancellable Dart Futures.

Object? _connectToken; // null = idle, non-null = connect in progress

How it works:

connect() mints a fresh Object() and passes it to _connectAsync:

void connect() {
  if (_disposed || _connectToken != null) return;
  final token = _connectToken = Object();
  unawaited(_connectAsync(token));
}

disconnect() sets _connectToken = null — one line that both cancels the in-flight connect and allows the next connect() to proceed:

Future<void> disconnect() async {
  _connectToken = null;
  final client = _client;
  if (client == null) return;
  ...
}

_connectAsync checks token identity after every suspension point. If the token no longer matches, the connect was superseded — it cleans up silently without emitting events:

Future<void> _connectAsync(Object connectToken) async {
  try {
    final client = await _clientFactory(...);

    if (_connectToken != connectToken || _disposed) {
      await client.disconnect(...); // clean up, no events
      return;
    }

    _client = client;
    _emit(SignalingConnected());
  } finally {
    if (_connectToken == connectToken) _connectToken = null;
  }
}

The finally guard ensures a superseded _connectAsync does not clear the token that belongs to a newer active connect.


Tests

Four unit tests in signaling_module_impl_test.dart using _ControlledFactory — a test helper with per-call Completer and call counter to control factory resolution timing precisely:

Test Verifies
BUG: second connect() silently dropped factory.callCount == 2 after disconnect + reconnect
BUG: stale connect sets _client after disconnect isConnected == false when stale factory resolves
after fix: second connect() reaches the factory factory called again after disconnect
after fix: stale result discarded, no SignalingConnected no event emitted for superseded connect

All 4 pass.


Diagnostic logs

D/SignalingModuleImpl: disconnect: no active client, in-flight connect cancelled
D/SignalingModuleImpl: _connectAsync: stale connect discarded

Both lines appearing in a log session confirm the race was triggered and correctly handled.


Scope

  • SignalingModuleImpl internals only — no public API changes, no event type changes
  • No changes to SignalingReconnectController, hub layer, or any caller

SERDUN added 3 commits April 10, 2026 12:32
… counter

disconnect() returned early when _client==null (connect still awaiting
factory), leaving _connecting=true and the in-flight _connectAsync
running. The next connect() was silently dropped, and if the stale
factory eventually resolved, the module set _client and emitted
SignalingConnected even though disconnect() had been called.

Root cause: the guard in disconnect() (`if (client == null) return`)
prevented it from resetting _connecting or invalidating the async task.

Fix: introduce a monotonically increasing _generation counter.
- connect() captures the current generation before starting _connectAsync
- disconnect() increments _generation and resets _connecting=false
- _connectAsync checks its generation at every suspension point; if it
  no longer matches, it cleans up its client without emitting events
- the finally block only resets _connecting for the current generation

Adds four tests reproducing the race: two document the broken behavior
(expected to pass after the fix), two verify correct behavior.
@SERDUN SERDUN changed the title fix(signaling): cancel in-flight connect on disconnect via generation counter fix(signaling): cancel in-flight connect on disconnect Apr 10, 2026
@SERDUN SERDUN requested a review from Copilot April 10, 2026 09:49

This comment was marked as resolved.

@SERDUN SERDUN marked this pull request as ready for review April 10, 2026 10:05
@SERDUN SERDUN requested a review from digiboridev April 10, 2026 10:29
@SERDUN SERDUN merged commit e15bd6b into develop Apr 10, 2026
1 check passed
@SERDUN SERDUN deleted the fix/signaling-disconnect-race-condition branch April 10, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants