-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
I’m running a tight two-step flow using callback streaming:
- Step1: strict JSON-only output (machine-parsed)
- Step2: follow-up generation consuming Step1 JSON immediately
In this setup, intermittent streaming/termination edge cases become much more frequent and can deadlock the pipeline because Step1 cannot be safely finalized.
This looks like either a bug or a contract ambiguity around:
- how to interpret/assemble streamed
onMessage(...)chunks for strict JSON, and - whether each streaming call is guaranteed to deliver a terminal callback.
Summary
Two back-to-back streaming requests with minimal delay:
- Step1: “JSON-only” output (strict, machine-parsed)
- Step2: follow-up generation using Step1 JSON as input
Failures appear primarily in Step1. Importantly, I do not observe Step1 fragments mixing into Step2 callbacks.
The pipeline deadlocks when Step1 is either:
- not parseable as JSON due to streaming corruption/duplication/garbage, or
- never formally terminated because neither
onDone()noronError(Throwable)arrives.
What I observe (intermittent)
A) Step1 output becomes impossible to treat as strict JSON
- Step1 streaming sometimes produces unexpected garbage/duplicated fragments.
- The assembled transcript becomes invalid JSON (stray text, broken quoting/braces, repeated endings, etc.).
- Since JSON never becomes valid, Step1 cannot be accepted as complete and Step2 cannot safely start.
B) “Logical completion” vs terminal callbacks can diverge
-
Streaming may stop producing meaningful content (or stop entirely), but:
onDone()is never delivered, andonError(Throwable)is also never delivered.
This makes it hard to distinguish:
- “Step1 completed but produced invalid JSON” vs
- “Step1 stalled internally”
Either way, the pipeline can hang.
Expected behavior
- Streaming should be stable enough to reconstruct strict JSON when the prompt requests JSON-only output.
- Each request should reach exactly one terminal outcome (
onDoneonce oronErroronce). - After termination (success/failure/cancel), no further callbacks for that request should arrive.
High-level correlations
- Back-to-back requests with minimal delay (tight Step1 → Step2)
- Watchdogs/timeouts trigger and
cancelProcess()/close()is called while callbacks may still be in flight - Lifecycle teardown/recreate happens during or between steps
Questions
- Is it guaranteed that each streaming request ends in exactly one terminal callback?
- Can
onMessage(...)chunks ever include non-new content (replay/flush artifacts) that complicate strict JSON reconstruction? - In a tight Step1→Step2 chain, is there a recommended isolation pattern (e.g., wait for terminal callback, recreate
Conversation, etc.)?
Current mitigation
- Single-flight gating (no overlap between runs)
- Watchdogs (first-token / stall / termination)
- Finalized flags to ignore late callbacks
- Stream normalization before JSON parsing
Environment
-
LiteRT‑LM:
com.google.ai.edge.litertlm:litertlm-android:0.9.0-alpha04- Version correlation: observed more frequently with
com.google.ai.edge.litertlm:litertlm-android:0.8.0.
- Version correlation: observed more frequently with
-
Android / device: Google Pixel 9a (API 36)
-
ABI:
arm64-v8a -
Model / backend / runtime config: same as Issue 1
-
Build environment: AGP 9.0.0, Kotlin 2.3.10 (Compose BOM 2026.02.00)