fix(observer): window invalid-output respawn so benign idle can't poison quiet sessions#3059
Open
crippledgeek wants to merge 7 commits into
Open
Conversation
- align DEFAULTS comments for the three respawn knobs - document why exemptClasses stays a structural ReadonlySet (Object.freeze is ineffective on a Set; intentional per design) - clarify the drop-unknown-tokens test comment (xml-not-exemptable vs unknown)
…son quiet sessions Closes thedotmack#3032. Refs thedotmack#2935.
Contributor
Greptile SummaryThis PR changes observer invalid-output recovery to use a time-windowed respawn policy. The main changes are:
Confidence Score: 5/5The change is narrowly scoped to observer recovery policy and telemetry/settings wiring, with focused tests covering the new windowed behavior and parsing bounds. No correctness issues were identified in the reviewed changes, and the added tests exercise the main behavioral boundaries described by the PR.
What T-Rex did
Reviews (1): Last reviewed commit: "fix(observer): wire respawn telemetry di..." | Re-trigger Greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replaces the unbounded
consecutiveInvalidOutputs >= 3respawn counter — which wrongly counted benignidle/prose SDK output and poison-looped quiet sessions, wiping context and dropping all captured work — with a time-windowed burst counter (modelled on systemd'sStartLimitBurst/StartLimitIntervalSecand Erlang/OTP supervisorintensity/period).Closes #3032
Refs #2935 #3007 #3037 #3022 #2955 #2960
Why
On low-signal sessions the SDK legitimately emits
idle/empty/prose responses. The old counter treated every one as an "invalid output" and, after 3 in a row, killed and respawned the observer — which on a quiet session just produces more idle output, so the respawn itself re-triggers the counter: a self-sustaining poison→respawn loop. The fix only respawns when invalid outputs arrive as a burst within a bounded window, so isolated benign idles can never accumulate to the threshold.How
src/services/worker/agents/respawn-policy.ts:evaluateRespawn(windowed decision with an exhaustiveness guard over output classes),parseRespawnPolicy, cached settings-backedgetRespawnPolicy, and aFailureWindowclass.ActiveSession.consecutiveInvalidOutputs→invalidOutputWindow(timestamps within the window, not a monotonic count).poisonedoutput still respawns immediately;idleis exempt by default.New settings (all string, in
SettingsDefaultsManager, defaults reproduce prior behavior minus the bug)CLAUDE_MEM_INVALID_OUTPUT_EXEMPT_CLASSESidleCLAUDE_MEM_INVALID_OUTPUT_RESPAWN_THRESHOLD3CLAUDE_MEM_INVALID_OUTPUT_WINDOW_MS60000Tests
tests/worker/agents/respawn-policy.test.ts(new, comprehensive: windowing, threshold boundaries, exempt classes, exhaustiveness).tests/shared/settings-respawn-policy.test.ts(new, settings parse + validation bounds).tests/worker/poison-respawn.test.ts,response-processor.test.ts,scrub.test.ts.tsc --noEmitclean.Notes
Refs(Generator poison loop: prompt says 'return empty response' to skip, but parser only accepts <skip_summary/> — empty/prose replies poison the session #2935, SDK session repeatedly poisoned by non-XML idle responses during heavy agentic sessions #3007, Generation halts in an infinite poison/respawn loop when the subscription account hits its weekly/usage limit (rate-limit prose misclassified as invalid output) #3037, Observer respawns ~1/3 of Haiku sessions on a keyword false-positive ("poisoned") — it should just ignore unparseable output #3022, SDK session poison-loop: generator emits prose "empty observation" instead of <skip_summary/>, triggering repeated kill/respawn #2955, tracking: observer poison loop — skip-prose counted as invalid, triggering endless kill/respawn cycle #2960) describe the same self-sustaining respawn loop from different triggers. Windowing the invalid-output burst is a contributing mitigation for them, so they're linked as related rather than closed — e.g. Observer respawns ~1/3 of Haiku sessions on a keyword false-positive ("poisoned") — it should just ignore unparseable output #3022'spoisoned-keyword false-positive is a distinct concern thepoisonedclass still respawns on immediately.tool@versionform (#2939) #3057/test(settings): isolate loadFromFile tests from all ambient default env keys #3058 (branched cleanly offmain, no shared commits) — can merge in any order.