Observation generation stuck in a self-sustaining poison→respawn loop: benign "prose"/"idle" SDK output counts toward the poison threshold, wiping context and dropping all captured work

## Environment

- claude-mem 13.8.0 (observed; appears to be a regression vs 13.6.2, which captured tool work fine)
- Windows 11, bun runtime, worker HTTP on 127.0.0.1:37777
- Generator model: claude-sonnet-4-5

## Summary

Across an entire multi-hour session, the observer/generator captured **only user prompts** and produced essentially **zero observations of actual work** (every generated observation had `files_read: []`). The worker logs show the generator session being **"poisoned" and respawned every ~30–60s** for hours.

Root cause: the parser treats a **benign** generator response (the model saying "nothing to record yet" as plain prose, or returning an empty/idle response) as an **invalid output**. After `Y9 = 3` consecutive such outputs the session is "poisoned": `conversationHistory = []` is wiped and the SDK session respawns (losing context, Issue #817, `preservedPending: 0`). Because the wipe destroys the very context the observer needs to accumulate, the next checkpoint again looks like "just a user prompt → nothing to record → prose", and the loop sustains itself forever. No work is ever captured.

## Code (worker-service.cjs, function `Bc`)

```js
// M9(t): output is "valid" only if it contains <observation>/<summary>/<skip_summary/>
// D9(t): outputClass -> "idle" | "poisoned" (matches Cwe API-failure markers) | "xml" | "prose"
if (!d.valid) {
  let k = D9(t), I = N9(t);
  if (e.consecutiveInvalidOutputs = (e.consecutiveInvalidOutputs ?? 0) + 1,
      g.warn("PARSER", `${a} returned non-XML ${k} response — ignoring queued batch`, {...}),
      k === "poisoned" || e.consecutiveInvalidOutputs >= Y9) {   // <-- BUG
        // poison: respawnPoisonedSession -> conversationHistory=[] wipe
  }
  ...
}
```

The problem is `|| e.consecutiveInvalidOutputs >= Y9`: benign `prose`/`idle` (the model legitimately having nothing to record) accumulates toward the threshold and triggers a context-destroying poison. `Cwe` (the genuine poison markers) are real API failures ("context window", "session exhausted", "prompt is too long", …) — `prose`/`idle` are not failures.

## Evidence (logs)

```
# Every cycle, for hours:
[WARN ] [PARSER] SDK returned non-XML prose response — ignoring queued batch {outputClass=prose, consecutiveInvalidOutputs=3}
[ERROR] [SESSION] SDK session poisoned — killing and respawning {outputClass=prose, consecutiveInvalidOutputs=3, threshold=3}
[WARN ] [SESSION] Respawning poisoned SDK session, preserving pending messages {preservedPending=0}
[WARN ] [SESSION] Discarding stale memory_session_id from previous worker instance (Issue #817)
# prose previews show the model only ever sees the prompt:
"I'm observing the primary session, but I don't see any technical work being performed yet - only the user's request."
# Net result: only prompt-echo observations, all with files_read: []
```

## Suggested fix (one condition)

Only poison on a genuine API-failure marker, never on accumulated benign prose/idle:

```js
-  k === "poisoned" || e.consecutiveInvalidOutputs >= Y9
+  k === "poisoned"
```

(Or, more conservatively: do not increment `consecutiveInvalidOutputs` for `k === "prose" || k === "idle"`; treat them as a no-op skip, equivalent to `<skip_summary/>`.)

## Verification of the fix (applied locally, before/after on the same session 327)

| | Before patch | After patch |
|---|---|---|
| Poison events | every ~30–60s for hours | **none** (count frozen) |
| `consecutiveInvalidOutputs=3/4` | → poison + context wipe | **ignored, session survives** |
| Observations stored | 1 in hours (prompt echo) | **3 in minutes, real content** |
| `files_read` | `[]` | **populated** (actual files read) |

After the patch, the generator survives the early "idle/prose" checkpoints, accumulates the real tool activity, and emits + stores genuine `<observation>` XML (`[DB] STORING | obsCount=1`).

## Related

Distinct from the worker-recycle/zombie issue (#3031); this is the observation-generation pipeline. Also touches Issue #817 (SDK context lost on respawn) — the poison loop makes #817 fire continuously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Observation generation stuck in a self-sustaining poison→respawn loop: benign "prose"/"idle" SDK output counts toward the poison threshold, wiping context and dropping all captured work #3032

Environment

Summary

Code (worker-service.cjs, function `Bc`)

Evidence (logs)

Suggested fix (one condition)

Verification of the fix (applied locally, before/after on the same session 327)

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	Before patch	After patch
Poison events	every ~30–60s for hours	none (count frozen)
`consecutiveInvalidOutputs=3/4`	→ poison + context wipe	ignored, session survives
Observations stored	1 in hours (prompt echo)	3 in minutes, real content
`files_read`	`[]`	populated (actual files read)

Uh oh!

Observation generation stuck in a self-sustaining poison→respawn loop: benign "prose"/"idle" SDK output counts toward the poison threshold, wiping context and dropping all captured work #3032

Description

Environment

Summary

Code (worker-service.cjs, function Bc)

Evidence (logs)

Suggested fix (one condition)

Verification of the fix (applied locally, before/after on the same session 327)

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Code (worker-service.cjs, function `Bc`)