fix: prevent WorkManager APPEND chain poisoning in background worker#404
fix: prevent WorkManager APPEND chain poisoning in background worker#404imnexerio wants to merge 2 commits into
Conversation
- Change ExistingWorkPolicy from APPEND to APPEND_OR_REPLACE so a failed or cancelled chain is replaced instead of permanently blocking future work. - Return Result.success() with failure output data instead of Result.failure() when engine initialization fails, preventing chain poisoning while preserving observability through output data (engine_init_failed flag + error message). - Add coroutine Mutex to synchronize engine initialization, preventing concurrent workers from racing on the init check. - Destroy FlutterEngine on the main thread via mainHandler.post() as required by the FlutterEngine API. - Preserve pending callback queue across engine resets so earlier workers' callbacks can still be dispatched after a successful re-initialization. Fixes: WorkManager chain state is persisted in its SQLite database and survives process death, force stop, and device reboots. Once any doWork() returned Result.failure(), all subsequently appended items were immediately cancelled, permanently breaking background callbacks until the user manually cleared app data.
|
To view this pull requests documentation preview, visit the following URL: docs.page/abausg/home_widget~404 Documentation is deployed and generated using docs.page. |
WalkthroughAdded thread-safe synchronization around Flutter engine initialization using a mutex guard. Enhanced error handling distinguishes between cancellation and general exceptions, with a new Changes
Sequence DiagramsequenceDiagram
participant Worker as HomeWidget<br/>BackgroundWorker
participant Mutex as engineMutex
participant Engine as Flutter<br/>Engine
participant MainThread as Main<br/>Thread
participant Queue as Callback<br/>Queue
Worker->>Mutex: Acquire lock for initialization
Mutex-->>Worker: Lock acquired
Worker->>Engine: Initialize engine
alt Initialization Success
Engine-->>Worker: Engine ready
Worker->>Queue: Dispatch pending callbacks
Queue-->>Engine: Execute callbacks
else CancellationException
Engine-->>Worker: Cancelled
MainThread->>Engine: resetEngine()
Engine->>Queue: Preserve callbacks
Worker->>Worker: Rethrow exception
else General Exception
Engine-->>Worker: Error
MainThread->>Engine: resetEngine()
Engine->>Queue: Preserve callbacks
Worker->>Queue: Queue callback with error data
Worker-->>Worker: Return success (non-fatal)
end
alt Engine Null After Init
Worker->>Queue: Queue callback data
Worker-->>Worker: Return success with error
end
Mutex->>Worker: Release lock
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses a WorkManager “work chain poisoning” failure mode in HomeWidgetBackgroundWorker by preventing a single engine initialization failure from permanently cancelling all future background callbacks.
Changes:
- Switches the unique-work enqueue policy from
ExistingWorkPolicy.APPENDtoAPPEND_OR_REPLACEto allow recovery from failed/cancelled chains. - Prevents engine init failures from returning
Result.failure()(and instead returnsResult.success()while queuing the callback payload). - Introduces
engineMutexand aresetEngine()helper, plus a defensive null check aroundengineafter initialization.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| val engineToDestroy = engine | ||
| engine = null | ||
| serviceStarted.set(false) |
There was a problem hiding this comment.
resetEngine() updates shared state (serviceStarted, and indirectly affects how the shared queue is processed) without taking the same monitor lock (synchronized(serviceStarted)) that guards reads/writes elsewhere. This can lead to inconsistent interleavings (e.g., a worker observing serviceStarted=true while another thread resets it) and makes the synchronization scheme harder to reason about. Consider synchronizing on the same lock inside resetEngine() when mutating serviceStarted (and any other shared state coordinated with it).
| val engineToDestroy = engine | |
| engine = null | |
| serviceStarted.set(false) | |
| val engineToDestroy: FlutterEngine? | |
| synchronized(serviceStarted) { | |
| engineToDestroy = engine | |
| engine = null | |
| serviceStarted.set(false) | |
| } |
| // Queue this work item's data so it is not silently lost. When a | ||
| // subsequent worker succeeds in starting the engine and Dart calls | ||
| // backgroundInitialized, the queued callback will be dispatched. | ||
| val data = inputData.getString(DATA_KEY) ?: "" | ||
| val args = listOf(HomeWidgetPlugin.getHandle(context), data) | ||
| synchronized(serviceStarted) { | ||
| queue.add(args) | ||
| } |
There was a problem hiding this comment.
On engine initialization failure, this work item’s args are added to the static in-memory queue and the worker returns Result.success(). If initialization keeps failing permanently (e.g., missing/stale callback handle), every new work will keep appending to queue and nothing will ever drain it, which can grow memory unbounded over time. Consider adding a bounded queue (drop oldest/newest), deduplicating payloads, or persisting a limited backlog to disk instead of accumulating indefinitely in RAM.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt (1)
80-87: Return structured output in the null-engine fallback for observability parity.Line 86 returns plain success, so this abnormal state is harder to detect downstream compared with the exception path.
📊 Suggested adjustment
- return Result.success() + return Result.success( + Data.Builder() + .putBoolean("engine_null_after_init", true) + .build() + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt` around lines 80 - 87, In the HomeWidgetBackgroundWorker null-engine fallback (inside the block that logs "Flutter engine is null after initialization, queuing callback data"), return a structured WorkManager output instead of plain Result.success(): build a Data payload (using the same DATA_KEY/value and HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g., "status":"engine_null") and the queued data, and then return Result.success(theData) so downstream observability can detect this abnormal path; update the return in that block and reference variables/methods: inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and serviceStarted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 51-55: The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.
- Around line 184-193: Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.
---
Nitpick comments:
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`:
- Around line 80-87: In the HomeWidgetBackgroundWorker null-engine fallback
(inside the block that logs "Flutter engine is null after initialization,
queuing callback data"), return a structured WorkManager output instead of plain
Result.success(): build a Data payload (using the same DATA_KEY/value and
HomeWidgetPlugin.getHandle(context)) that includes a status or error key (e.g.,
"status":"engine_null") and the queued data, and then return
Result.success(theData) so downstream observability can detect this abnormal
path; update the return in that block and reference variables/methods:
inputData, DATA_KEY, HomeWidgetPlugin.getHandle(context), queue, and
serviceStarted.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt
| val data = inputData.getString(DATA_KEY) ?: "" | ||
| val args = listOf(HomeWidgetPlugin.getHandle(context), data) | ||
| synchronized(serviceStarted) { | ||
| queue.add(args) | ||
| } |
There was a problem hiding this comment.
Bound the pending callback queue to avoid unbounded memory growth.
Line 54 and Line 84 enqueue on failure/null-engine paths, and Line 164 keeps that queue across resets. If initialization remains broken, this can grow indefinitely and cause memory pressure.
♻️ Proposed fix
@@
- synchronized(serviceStarted) {
- queue.add(args)
- }
+ enqueuePending(args)
@@
- synchronized(serviceStarted) {
- queue.add(args)
- }
+ enqueuePending(args)
@@
- queue.add(args)
+ enqueuePending(args)
@@
private const val UNIQUE_WORK_NAME = "home_widget_background"
private const val CHANNEL_NAME = "home_widget/background"
+ private const val MAX_PENDING_CALLBACKS = 256
@@
private val queue = ArrayDeque<List<Any>>()
@@
+ private fun enqueuePending(args: List<Any>) {
+ synchronized(serviceStarted) {
+ if (queue.size >= MAX_PENDING_CALLBACKS) {
+ queue.removeFirst()
+ Log.w(TAG, "Pending callback queue full; dropping oldest item")
+ }
+ queue.addLast(args)
+ }
+ }Also applies to: 83-85, 164-166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 51 - 55, The pending callback queue in HomeWidgetBackgroundWorker
(the shared "queue" accessed inside synchronized(serviceStarted)) is unbounded
and can grow indefinitely; fix this by introducing a fixed maximum size (e.g.,
MAX_QUEUE_SIZE) and enforcing it wherever items are added (the enqueue sites
around DATA_KEY handling and the failure/null-engine paths referenced in the
diff and lines ~54, ~84, ~164), dropping oldest entries or rejecting new ones
deterministically when the limit is reached; also ensure the
reset/initialization code (the code that currently "keeps that queue across
resets") clears or reinitializes the queue atomically to prevent carrying over
stale items.
| // Use APPEND_OR_REPLACE instead of APPEND to prevent work chain | ||
| // poisoning. With APPEND, if any work item in the chain fails, all | ||
| // subsequently appended items are immediately cancelled by WorkManager. | ||
| // This failure state is persisted in WorkManager's SQLite database and | ||
| // survives process death, force stop, and device reboots — permanently | ||
| // breaking background callbacks until the user clears app data. | ||
| // APPEND_OR_REPLACE replaces the chain if it has failed or been | ||
| // cancelled, allowing recovery without user intervention. | ||
| WorkManager.getInstance(context) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND, workRequest) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND_OR_REPLACE, workRequest) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Finding build.gradle and build.gradle.kts files =="
find . -type f \( -name "build.gradle" -o -name "build.gradle.kts" -o -name "libs.versions.toml" \) | head -20
echo
echo "== Searching for WorkManager dependency declarations =="
rg -n --no-heading 'androidx\.work|work[-._]runtime|work["\s]' --type gradle --type toml 2>/dev/null || rg -n --no-heading 'androidx\.work|work[-._]runtime' 2>/dev/null || true
echo
echo "== Checking for APPEND_OR_REPLACE usage =="
rg -n --no-heading 'APPEND_OR_REPLACE|ExistingWorkPolicy' 2>/dev/null || trueRepository: ABausG/home_widget
Length of output: 1283
🌐 Web query:
androidx.work ExistingWorkPolicy APPEND_OR_REPLACE release version history
💡 Result:
ExistingWorkPolicy.APPEND_OR_REPLACE was added in AndroidX WorkManager 2.4.0 (released July 22, 2020). The WorkManager 2.4.0 release notes list it under API changes (“Add ExistingWorkPolicy.APPEND_OR_REPLACE…”). [1]
The enum value is part of androidx.work.ExistingWorkPolicy in androidx.work:work-runtime. [2]
Relevant 2.4.0 train (dates): 2.4.0-alpha01 (Mar 4, 2020), 2.4.0-alpha02 (Apr 1, 2020), 2.4.0-alpha03 (Apr 29, 2020), 2.4.0-beta01 (May 20, 2020), 2.4.0-rc01 (Jun 24, 2020), 2.4.0 (Jul 22, 2020). [1]
Sources:
[1] WorkManager release notes (Android Developers) (developer.android.com)
[2] ExistingWorkPolicy API reference (Android Developers) (developer.android.com)
Citations:
- 1: https://developer.android.com/jetpack/androidx/releases/work
- 2: https://developer.android.com/reference/androidx/work/ExistingWorkPolicy?utm_source=openai
Pin WorkManager version to 2.4.0 or higher to support APPEND_OR_REPLACE.
The main module uses androidx.work:work-runtime-ktx:2.+, a dynamic version that can resolve to 2.0–2.3.x, which do not include ExistingWorkPolicy.APPEND_OR_REPLACE (introduced in 2.4.0). Change the dependency to androidx.work:work-runtime-ktx:2.4.0 or higher (e.g., 2.8.1 like the example module) to guarantee compatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/home_widget/android/src/main/kotlin/es/antonborri/home_widget/HomeWidgetBackgroundWorker.kt`
around lines 184 - 193, Update the WorkManager dependency so
ExistingWorkPolicy.APPEND_OR_REPLACE used with WorkManager.getInstance(...) is
available: replace the dynamic dependency "androidx.work:work-runtime-ktx:2.+"
with a fixed version >= 2.4.0 (for example 2.8.1) in the module's Gradle
dependencies, then sync/rebuild so the call to
ExistingWorkPolicy.APPEND_OR_REPLACE when enqueuing UNIQUE_WORK_NAME works at
compile time.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and display build.gradle files in the android directory
find packages/home_widget/android -name "build.gradle" -o -name "build.gradle.kts" | xargs -I{} sh -c 'echo "=== {} ===" && cat "{}"'
Length of output: 1376
@ABausG, confirmed — here's the exact line to fix in packages/home_widget/android/build.gradle:
- implementation "androidx.work:work-runtime-ktx:2.+"
+ implementation "androidx.work:work-runtime-ktx:2.9.1"2.9.1 is a recent stable release well above the 2.4.0 minimum needed for APPEND_OR_REPLACE. Any version >= 2.4.0 will work, but pinning to a specific stable version is generally safer than a dynamic 2.+ selector to avoid unexpected regressions from future minor releases.
ᓚᘏᗢ
|
@imnexerio can you please stop closing and reopening the same PR over and over again? Also please share what exactly the issue is you are seeing. (Not the cause as described in the PR title but how this manifests for you/users) |
| // Use APPEND_OR_REPLACE instead of APPEND to prevent work chain | ||
| // poisoning. With APPEND, if any work item in the chain fails, all | ||
| // subsequently appended items are immediately cancelled by WorkManager. | ||
| // This failure state is persisted in WorkManager's SQLite database and | ||
| // survives process death, force stop, and device reboots — permanently | ||
| // breaking background callbacks until the user clears app data. | ||
| // APPEND_OR_REPLACE replaces the chain if it has failed or been | ||
| // cancelled, allowing recovery without user intervention. | ||
| WorkManager.getInstance(context) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND, workRequest) | ||
| .enqueueUniqueWork(UNIQUE_WORK_NAME, ExistingWorkPolicy.APPEND_OR_REPLACE, workRequest) |
| // in WorkManager's database and survives process restarts, effectively | ||
| // breaking all future background callbacks until app data is cleared. | ||
| // Output data carries the failure signal for observability. | ||
| return Result.success( |
There was a problem hiding this comment.
This is still not clear to me why this would poison the work manager?
Would this be solved by just the change of doing APPEND_OR_REPLACE?
There was a problem hiding this comment.
The issue was that WorkManager's APPEND policy permanently poisons the work chain when any item fails. The fix uses APPEND_OR_REPLACE so failed chains are replaced, but we still return success() with failure metadata (engine_init_failed flag) to preserve observability. The Mutex ensures thread-safe initialization, and queuing the callback data on failure ensures no callback data is lost—they'll be dispatched once the engine recovers. The latest commit adds defensive checks to handle edge cases where the engine could become null between initialization and use.
Previously when it fails it won't reset until we wipe the app's data completely.
There was a problem hiding this comment.
That I got thanks again!
Can you explain again why only having the APPEND_OR_REPLACE would not solve it for the next run?
To me it feels wrong to mark it as success even when nothing actually ran.
Can you maybe share a repo where you have a reproducible example of this breaking in the current form?
There was a problem hiding this comment.
Current execution (fails):
├─ Engine init fails
├─ Return Result.failure()
└─ Chain becomes FAILED (persisted in WorkManager DB)
User interaction triggers next work:
├─ App enqueues new work with APPEND_OR_REPLACE
├─ WorkManager sees chain is FAILED, so replaces it ✓
└─ New work runs... but what about the callback data from step 1?
The problem: The callback data from the first failed work item is silently lost because:
It was never queued for dispatch
It's not in the work request (consumed by the failed execution)
There's no mechanism to retry it
So APPEND_OR_REPLACE fixes the chain going forward, but the first callback is gone forever.
see this : imnexerio@0e7f102
here is the app which was causing problem with home_widget: ^0.9.0 : https://github.com/imnexerio/revix (see the current public release app or repo at that point)
here are the details of failure :
User taps "Refresh Widget" → RefreshService.kt:96
└─ HomeWidgetBackgroundIntent.send()
└─ WorkManager.enqueueUniqueWork(..., APPEND, ...)
└─ HomeWidgetBackgroundWorker.doWork()
├─ Flutter engine init fails (process death, low memory, etc.)
└─ Returns Result.failure() ❌ POISONS CHAIN
Next user action → RecordUpdateService.kt:143
└─ HomeWidgetBackgroundIntent.send()
└─ WorkManager.enqueueUniqueWork(..., APPEND, ...)
├─ Looks up chain by UNIQUE_WORK_NAME
├─ Sees FAILED status (in SQLite DB)
└─ IMMEDIATELY CANCELS new work ❌ NO EXECUTION
Result: All future callbacks silently fail until user clears app data
this is the fix :
try {
initializeFlutterEngine()
} catch (e: Exception) {
resetEngine()
// Queue the data so it's not lost
queue.add(inputData)
// Return success to keep chain alive + metadata
return Result.success(
Data.Builder()
.putBoolean("engine_init_failed", true)
.putString("error", e.message)
.build()
)
}
Problem
HomeWidgetBackgroundWorker uses ExistingWorkPolicy.APPEND with enqueueUniqueWork(). If initializeFlutterEngine() throws (e.g., stale callback handle, lookup failure), doWork() returns Result.failure().
With APPEND, a failed work item cancels all subsequently appended work items. This failure state is persisted in WorkManager's SQLite database and survives process death, force stop, and device reboots — permanently breaking all background callbacks until the user manually clears app data.
Root Cause
The combination of ExistingWorkPolicy.APPEND + Result.failure() creates an unrecoverable state:
1=Engine init fails → Result.failure() returned
2=WorkManager marks the chain as failed
3=All future enqueueUniqueWork(APPEND) calls append to the failed chain → immediately cancelled
4=No background callbacks execute until WorkManager's DB is wiped (clear app data)
Fix
1=APPEND → APPEND_OR_REPLACE — If the chain is failed/cancelled, new work replaces it instead of being appended to a dead chain
2=Result.failure() → Result.success() on engine init failure — prevents chain poisoning in the first place
Added resetEngine() — properly destroys a partially initialized FlutterEngine and resets static state so the next work item retries from clean state
3=Added null check for engine after init block — graceful handling if engine is null