Skip to content

feat(android): replace FGS with direct WebSocket for push-bound mode#1235

Merged
SERDUN merged 11 commits intodevelopfrom
fix/push-bound-direct-handoff
Apr 30, 2026
Merged

feat(android): replace FGS with direct WebSocket for push-bound mode#1235
SERDUN merged 11 commits intodevelopfrom
fix/push-bound-direct-handoff

Conversation

@SERDUN
Copy link
Copy Markdown
Member

@SERDUN SERDUN commented Apr 29, 2026

Summary

Replaces the Foreground Service hub with a direct WebSocket for `pushBound` mode on Android.
The WebSocket now runs directly in the calling isolate — identical to how iOS has always worked.
FGS is retained only for `persistent` mode.

Motivation

1. Persistent notification in push-bound mode

In the FGS hub model, Android requires a visible foreground-service notification for the entire
duration of the push isolate's lifetime — including while the user is actively on the call.
This notification cannot be suppressed while the FGS is running.
Direct mode has no foreground service in push-bound, so no persistent notification is shown.

2. Ability to fully exclude FGS from the build

With FGS only used for `persistent` mode, apps that ship exclusively with push-bound can
remove the `<service android:name=".SignalingService" .../>" declaration from
`AndroidManifest.xml` entirely. This eliminates the manifest permission surface and avoids
Android 12+ foreground-service start restrictions (`ForegroundServiceStartNotAllowedException`,
`ForegroundServiceDidNotStartInTimeException`) for push-bound users.

3. Eliminates an entire class of FGS crashes

The FGS hub introduced a set of crashes that cannot be fixed cleanly:

  • `ForegroundServiceStartNotAllowedException` — Android 12+ background start restriction
  • `ForegroundServiceDidNotStartInTimeException` — stop/start races during logout
  • Hub polling loop, watchdog, stale port detection — all for a service the OS treats as
    disposable (`START_NOT_STICKY`, `stopSelf` on `onTaskRemoved`)

Architecture

```
Before (FGS hub):
Push → push isolate → startForegroundService() → FGS engine (WS) ← Activity

After (direct):
Push → push isolate → WebSocket (in push isolate)
Activity → WebSocket (in Activity)
```

No FGS, no IsolateNameServer hub, no background FlutterEngine (~60 MB RAM saved).

Push isolate ↔ non-push isolate handoff

Both isolates open independent WebSockets. Role detection is based purely on whether
`setHandoffCallback()` was called before `_startDirect()` — not on which isolate is running:

`_handoffCallback` Role Behaviour
set Push isolate Registers `ReceivePort` under `kPushHandoffPortName` in `IsolateNameServer`
null Non-push isolate On `SignalingConnected`, looks up the port and sends null

Two mechanisms close the push isolate early:

IsolateNameServer handoff — non-push isolate sends null to the push isolate's port on
`SignalingConnected`. Push isolate receives the signal → `notifyActivityTookOver()` →
`_complete()`, cancelling the reconnect timer before it fires.

4441 fast path — server sends `controllerForceAttachClose` when it detects a duplicate
session. `PushNotificationIsolateManager` handles `SignalingDisconnected(code: 4441)` as an
early-exit signal via `_complete()`.

Whichever path fires first closes the push session.

Reliability

The `SignalingModuleImpl` (WebSocket logic) is unchanged — same code, same behaviour.
Two additions on top:

  • 4441 handling: follows the existing foreground-service disconnect flow — when the server
    signals that another session has taken over, the push isolate closes cleanly without treating
    it as an error.
  • Handoff callback: a faster, local signal that does not depend on the server sending 4441.
    Avoids forcing the call through an error path if the server is slow or the 4441 arrives after
    the non-push isolate is already handling the call.

Files changed

File Change
`isolate_manager.dart` `notifyActivityTookOver()`, 4441 fast path in `SignalingDisconnected` handler; updated docs
`background_isolate_callbacks.dart` `setModuleFactory()` registered in push isolate; `setHandoffCallback()` always registered; FGS doc references removed
`bootstrap.dart` removed `setPushBoundStrategy()` call
`environment_config.dart` removed `WEBTRIT_PUSH_BOUND_USE_DIRECT`
`signaling_service_platform.dart` removed `attach()` and `setPushBoundStrategy()`
`signaling_service.dart` removed `attach()` and `setPushBoundStrategy()` static wrappers
`constants.dart` (android) `kPushHandoffPortName`
`plugin.dart` (android) `_startDirect()` unconditional for pushBound; handoff port registration/cleanup; non-push isolate signal send; removed `pushBoundUseDirect` flag, `setPushBoundStrategy()`, and `attach()`
`docs/signaling_architecture_target.md` updated dependency diagram; added handoff mechanism section

SERDUN added 3 commits April 29, 2026 15:03
… Android

Adds WEBTRIT_PUSH_BOUND_USE_DIRECT dart-define flag. When true, push-bound
incoming calls on Android bypass the FGS and run the WebSocket directly in
the calling isolate, eliminating ForegroundServiceStartNotAllowedException
and the persistent notification. Behaviour is identical to the iOS path.

Default is false — existing FGS behaviour is preserved unless the flag is set.
Dart isolates do not share static memory. The flag set in bootstrap.dart was
invisible to the push isolate, causing it to start the FGS while the Activity
used the direct WebSocket. This created a 4441 reconnect loop: the server
kicked the FGS connection when the direct WS arrived, the FGS scheduled a
safety reconnect, server kicked that too, and so on.

Fix: call setPushBoundStrategy() in _getOrInit() before the signaling service
is created. EnvironmentConfig.PUSH_BOUND_USE_DIRECT is a compile-time const —
it has the correct value in all isolates without any IPC.
…onnect loop

When Activity's WebSocket connects in direct push-bound mode the push isolate
had no way to know — it waited the full 20 s timeout and reconnected, kicking
the Activity's connection and creating an infinite loop.

Fix: push isolate registers a ReceivePort under kPushHandoffPortName in
IsolateNameServer (detected by the presence of a handoffCallback set via
setHandoffCallback). Activity's _startDirect() sends null to that port on
SignalingConnected. On receipt, notifyActivityTookOver() completes the push
isolate's run() future early, letting _disposeContext() cancel the module and
its reconnect timer before they fire.

Does not depend on server sending 4441 — works regardless of server config.
4441 detection (controllerForceAttachClose → _complete()) kept as fast path.
@SERDUN SERDUN changed the title fix(push-bound-direct): IsolateNameServer handoff to prevent reconnect loop feat(android): direct WebSocket strategy for push-bound mode (no FGS) Apr 29, 2026
@SERDUN SERDUN added the draft Not ready but can be start to review label Apr 29, 2026
SERDUN added 2 commits April 29, 2026 20:15
… pushBound direct

When updateMode(pushBound) was called after persistent mode, _hubManager was
still active with its FGS WebSocket. _startDirect() then opened a second
WebSocket in the same process — two simultaneous connections triggered the
reconnect loop. Fix: tear down _hubManager and stop the FGS service before
_startDirect() when the target mode is direct pushBound.
Removes `pushBoundUseDirect` flag and `setPushBoundStrategy()` entirely.
pushBound now unconditionally uses the direct WebSocket path (no FGS),
matching how iOS has always worked. FGS is kept only for persistent mode.

Removes:
- `EnvironmentConfig.PUSH_BOUND_USE_DIRECT` compile-time const
- `WebtritSignalingService.setPushBoundStrategy()` static wrapper
- `SignalingServicePlatform.setPushBoundStrategy()` no-op default
- `WebtritSignalingServiceAndroid.pushBoundUseDirect` static field
- All `&& pushBoundUseDirect` conditions in plugin.dart
- `setPushBoundStrategy()` calls in bootstrap.dart and background_isolate_callbacks.dart
- `if (PUSH_BOUND_USE_DIRECT)` guard around `setHandoffCallback` — now always registered
@SERDUN SERDUN changed the title feat(android): direct WebSocket strategy for push-bound mode (no FGS) feat(android): replace FGS with direct WebSocket for push-bound mode Apr 29, 2026
@WebTrit WebTrit deleted a comment from claude Bot Apr 29, 2026
SERDUN added 6 commits April 29, 2026 20:47
attach() was designed for the FGS hub model where the Activity needed to
connect to an already-running hub without calling start() again. In practice
it was never wired up in app code — connect() always handled the full flow.
With pushBound now using direct WebSockets, there is no hub to attach to,
making the method permanently obsolete.
1. signaling_service.dart: remove stale setPushBoundStrategy paragraphs
   from setHandoffCallback docstring (referenced non-existent [useDirect])

2. isolate_manager.dart: update four stale FGS/hub references to reflect
   direct WebSocket model; add comment on _releaseCall branch explaining
   why it is safe when Activity takes over before IncomingCallEvent arrives
The push isolate is a separate Dart VM and never receives the
setModuleFactory() call made in bootstrap.dart (Activity isolate).
Without this, _startDirect() throws StateError when connect() fires
from run() because _factory is null in the push isolate's VM.
…d FGS references

SignalingServiceModuleAdapter was never implemented; WebtritSignalingService
implements SignalingModule directly. attach() was removed. PushNotification-
IsolateManager now uses a direct WebSocket (no FGS); FGS is persistent mode only.
…and docs

Replace "Activity" with "non-push isolate" in all comments and log messages —
the handoff mechanism is determined by _handoffCallback != null, not by which
Flutter isolate is running. Add a dedicated section to signaling_architecture_target.md
explaining the IsolateNameServer-based handoff sequence and role detection logic.
@WebTrit WebTrit deleted a comment from claude Bot Apr 30, 2026
@SERDUN SERDUN removed the draft Not ready but can be start to review label Apr 30, 2026
@SERDUN SERDUN marked this pull request as ready for review April 30, 2026 05:57
@SERDUN SERDUN requested a review from digiboridev April 30, 2026 06:06
@SERDUN SERDUN merged commit 32e9b77 into develop Apr 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants