fix(signaling): parallelize Pigeon IPC calls in _startService() to reduce FGS delay#1101
Merged
SERDUN merged 2 commits intoApr 10, 2026
Merged
Conversation
…duce FGS delay Replace three sequential await calls with Future.wait() so startService() (which calls startForegroundService()) is dispatched without waiting for three Binder round-trips first. All four calls share the same BinaryMessenger, which delivers messages to the Kotlin main-thread Looper in FIFO order. startService() is always processed after the credential writes, so synchronizeIsolate() reads correct data on the first attempt. Under memory pressure (Xiaomi HyperOS), each sequential Binder round-trip adds ~100–300 ms. Removing three of them saves ~300–900 ms before startForegroundService() is called, widening the margin within the vendor-specific FGS promotion window.
…quential Address review concerns: - Separate credential writes (Future.wait) from startService() (sequential await) so startService() is only called after all SharedPreferences writes complete - Removes the cross-channel FIFO ordering assumption: credentials are independent of each other and run concurrently; startService() is explicitly sequenced after Future.wait() resolves - A failure in any credential write now prevents startService() from being dispatched, preserving the original error-isolation behavior Still removes two sequential Binder round-trips (~200–600 ms under memory pressure) before startForegroundService() is called.
SERDUN
added a commit
that referenced
this pull request
Apr 11, 2026
…n Xiaomi (#1099) * chore(signaling): umbrella branch for Xiaomi FGS crash fixes * fix(signaling): call startForeground() first in onCreate() to reduce FGS timeout risk (#1100) Move startForeground() to be the first call after super.onCreate(), before any SharedPreferences IPC. Under memory pressure (Xiaomi HyperOS), the SharedPrefs read in getCallbackDispatcher() can block the main thread for 50-500ms, pushing startForeground() past Xiaomi's aggressive ~1.2s window and causing ForegroundServiceDidNotStartInTimeException. startForeground() has no dependency on FlutterEngineHelper or callbackHandle — the engine starts in onStartCommand(), which runs after onCreate() completes. * fix(signaling): parallelize Pigeon IPC calls in _startService() to reduce FGS delay (#1101) * fix(signaling): parallelize Pigeon IPC calls in _startService() to reduce FGS delay Replace three sequential await calls with Future.wait() so startService() (which calls startForegroundService()) is dispatched without waiting for three Binder round-trips first. All four calls share the same BinaryMessenger, which delivers messages to the Kotlin main-thread Looper in FIFO order. startService() is always processed after the credential writes, so synchronizeIsolate() reads correct data on the first attempt. Under memory pressure (Xiaomi HyperOS), each sequential Binder round-trip adds ~100–300 ms. Removing three of them saves ~300–900 ms before startForegroundService() is called, widening the margin within the vendor-specific FGS promotion window. * fix(signaling): parallelize credential writes, keep startService() sequential Address review concerns: - Separate credential writes (Future.wait) from startService() (sequential await) so startService() is only called after all SharedPreferences writes complete - Removes the cross-channel FIFO ordering assumption: credentials are independent of each other and run concurrently; startService() is explicitly sequenced after Future.wait() resolves - A failure in any credential write now prevents startService() from being dispatched, preserving the original error-isolation behavior Still removes two sequential Binder round-trips (~200–600 ms under memory pressure) before startForegroundService() is called. * fix(signaling): persistent mode FGS recovery after OS kill (#1103) * fix(signaling): persistent mode FGS recovery after OS kill Part 1 — FCM path: add WebtritSignalingService.restoreService() static method (Pigeon connect() → Kotlin) called from onPushNotificationSyncCallback finally block after _disposeContext(). Restores the persistent WebSocket for future calls when FGS is dead and FCM push is the first trigger. Part 2 — No-GMS path: add SignalingRestartWorker (WorkManager one-shot) enqueued from onDestroy (15 s) and onTaskRemoved (1 s) in persistent mode. Result.retry() handles Android 12+ ForegroundServiceStartNotAllowedException. SignalingRestartWorker.remove() called first in stopService() to prevent restart after explicit logout. Also fixes thread-safety bug: isRunning is now @volatile. Closes bug-3-no-persistent-mode-recovery. * fix(signaling): address Copilot review comments on FGS recovery - cancelUniqueWork instead of cancelAllWorkByTag in SignalingRestartWorker.remove() - doWork(): retry only on ForegroundServiceStartNotAllowedException, failure() on all others - doWork(): gate on tenantId/token in addition to coreUrl - doWork(): Log.e with throwable for full stack trace - connect() in Plugin: add tenantId/token guards; catch ForegroundServiceStartNotAllowedException natively and enqueue WorkManager instead of propagating to Dart - onDestroy/onTaskRemoved: gate WorkManager enqueue on coreUrl+dispatcher to skip after logout - background_isolate_callbacks: fix comment (exception handled natively), pass (e, st) to logger - Add unit tests: plugin_test restoreService channel wiring, signaling_service_test delegation * fix(signaling): address second round Copilot review comments - SignalingRestartWorker: update KDoc to reflect actual retry logic (retry only on ForegroundServiceStartNotAllowedException, failure() on others) - SignalingRestartWorker: Log.w for transient ForegroundServiceStartNotAllowedException, Log.e only for permanent failures — reduces noise in normal retry scenarios - SignalingForegroundService.onDestroy: add tenantId/token guards to align with connect() and doWork() credential checks - SignalingForegroundService.onTaskRemoved: same tenantId/token guard alignment * fix(signaling): remove direct API-31 class reference for minSdk-26 safety Replace `is ForegroundServiceStartNotAllowedException` checks with javaClass.name string comparison in WebtritSignalingServicePlugin and SignalingRestartWorker. Removes direct import of the API-31+ framework class, eliminating any risk of class-verification errors on pre-31 ART. Also removes the now-redundant Build.VERSION.SDK_INT guard and imports. Remove stale inline comment from background_isolate_callbacks.dart. * fix(signaling): restore idiomatic API-31 exception check via @RequiresApi helper javaClass.name string comparison is fragile, not idiomatic, and misses subclasses. Replace with a @RequiresApi(S) helper that holds the `is` check in API-guarded code, satisfying Lint while keeping type safety. The outer Build.VERSION.SDK_INT >= S guard means the helper is never called on pre-31 devices, so no class-loading risk exists. * fix: push isolate reuses FGS hub WebSocket (1-socket invariant) (#1104) * fix: push isolate reuses FGS hub WebSocket instead of opening its own PushNotificationIsolateManager previously always created a SignalingModuleImpl directly, opening a second WebSocket when the FGS hub was already running. This violated the 1-WebSocket invariant. The fix adds createPushIsolateModule() to SignalingServicePlatform and WebtritSignalingService. On Android, the method checks IsolateNameServer for a live FGS hub: if found and acknowledged, a SignalingHubModule is returned and no new connection is opened. When no hub is active (app killed), falls back to a direct SignalingModuleImpl. PushNotificationIsolateManager now exposes an init() method for async hub discovery, called from _getOrInit() before run(). run() skips connect() when the module reports isConnected (hub reuse path). * logs: add diagnostic logs for push isolate signaling module resolution Adds log lines to verify the 1-socket invariant at runtime: - _initSignaling(): logs module type and isConnected after createPushIsolateModule resolves, making hub-reuse vs direct-socket path visible in logcat - close(): logs module type and pending request count on teardown - _getOrInit(): logs before and after init() so init timing is traceable These logs confirm whether the hub path or fallback path is taken on each push notification, and that teardown completes cleanly. * fix: push isolate starts FGS and waits for hub instead of direct socket Previously createPushIsolateModule fell back immediately to a direct SignalingModuleImpl when no hub was running (pushBound mode, first push after app close). This violated the 1-WebSocket invariant: both the push isolate and the Activity would open their own WebSocket connections. New flow: 1. Hub already running -> reuse (persistent mode or second push) 2. No hub -> startFgsOnly(pushBound) -> poll IsolateNameServer until hub registers and acknowledges -> return SignalingHubModule 3. Hub not available after 10 s -> emergency fallback to direct module The push isolate now always shares the same FGS WebSocket with the Activity, matching the intended architecture. * fix: address Copilot review comments on push isolate hub reuse - _tryConnectHub: yield to event loop after ack so hub replay events (SignalingConnected etc.) are processed before returning the module. Prevents run() from calling connect() on an already-live hub session due to isConnected being transiently false while replay is in flight. - run(): throw StateError when called before init() instead of silently leaving _completer unresolved and hanging the caller until timeout. - lefthook.yml: pass {1} to commit-msg-check.sh so the hook reads the actual commit message file (git hook argv) instead of git log -1. * refactor: unify push isolate signaling with Activity via WebtritSignalingService Push isolate now uses WebtritSignalingService(mode: pushBound) directly, the same mechanism the Activity uses. HubConnectionManager inside the service handles FGS start, hub discovery, and auto-reconnect if the hub is killed between push arrival and Activity open. Removes createPushIsolateModule from platform API — it was a one-shot workaround that bypassed HubConnectionManager and had no reconnect logic. Removes _tryConnectHub, _startFgsOnly, and related constants from the Android plugin. The single code path eliminates the duplication. Lifecycle boundary remains unchanged: push mode — FGS lives from push arrival through Activity close persistent — FGS lives indefinitely * fix: address remaining Copilot review comments - Remove unnecessary async from _initSignaling() — nothing is awaited - Rename log field hubConnected -> isConnected (accurate with WebtritSignalingService) - Always call connect() in run() — WebtritSignalingService.connect() is idempotent via _startPending/_isConnected guard; the old conditional was left over from SignalingHubModule direct usage - Fix class doc: qualify 1-WebSocket claim as Android-only - Fix _initSignaling doc: remove auto-reconnect claim, HubConnectionManager handles FGS start and hub discovery, not reconnect after disconnect - commit-msg-check.sh: add guard for missing or unreadable $1 argument * docs: fix stale comments after signaling unification - init() doc: hub discovery and FGS start happen in connect() via HubConnectionManager, not in init() as the old comment stated - background_isolate_callbacks: same correction for _getOrInit comment - commit-msg-check.sh: add set -euo pipefail for consistency with other scripts * refactor: replace nullable signalingModule with late field and remove unnecessary async from init() - _signalingModule/_signalingSubscription: SignalingModule? → late SignalingModule; eliminates null-aware ?. access throughout the class since run() requires init() first - Add _initialized bool flag so close() can guard disposal without accessing late fields prematurely - init(): Future<void> async => _initSignaling() → void init() (sync, no await needed) - run(): null check replaced with !_initialized guard; remove intermediate `module` local - All ?. accesses on _signalingModule replaced with direct . access
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
_startService()inplugin.dartcalledstartForegroundService()(via_hostApi.startService()) only after completing three sequential Pigeonawaitcalls. Eachawaitis a full Binder round-trip: Dart → BinaryMessenger → Kotlin main thread → Binder reply → Dart resumes.Under Xiaomi HyperOS memory pressure, each round-trip adds ~100–300 ms. Three sequential calls add ~300–900 ms before
startForegroundService()is even called — consuming a significant portion of Xiaomi's aggressive ~1.2s FGS promotion window.Fix
The three credential writes are independent — they write to separate SharedPreferences keys with no ordering dependency between them. They are parallelized via
Future.wait().startService()remains a separate sequentialawaitafterFuture.wait()resolves, preserving the guarantee that credentials are persisted before the service starts.Result
startForegroundService()is calledstartService()is explicitly sequenced afterFuture.wait()— no cross-channel ordering assumptionstartService()from being dispatched, preserving the original error-isolation behaviorPart of
Umbrella: #1099