Skip to content

Cost ledger writes identical rows every 30s for idle/ended agents (2,417 duplicates in 25h) #56

@billrehm

Description

@billrehm

What happened?

When an agent's PTY session ends but the agent remains non-archived in registry.json, the breaker beat timer writes an identical cost-ledger entry for that agent every 30 seconds, indefinitely. In one observed instance, agent jim-mq2npyp5 (session ended 2026-06-06, never archived) accumulated 2,417 identical entries in cost-ledger.jsonl over approximately 25 hours. Each entry showed usd: 1053.52, a plausible-looking but completely spurious figure from the frozen transcript. Any cost reporting tool summing the ledger would produce a wildly inflated number.
Root cause (three contributing factors):

  • runBreakerBeat (src/main/index.ts ~line 392) iterates all non-archived, non-assistant agents and calls appendCostLedger(sample) unconditionally on every 30-second tick, including agents with no active session.
  • transcriptFallback (src/main/telemetry.ts ~line 389-405) is called when aggregateLive returns null (no active OTel session). It reads the agent's frozen transcript directory and returns a non-null sample with sessionId: '' (the documented sentinel for "session unknown on transcript path"). As long as the transcript directory exists with any data, it always returns the same frozen cumulative totals.
  • appendCostLedger (src/main/hive.ts ~line 706-723) does no deduplication. It appends unconditionally.

Suggested fix (one line):
// src/main/index.ts ~line 392
// Before:
if (sample) hive.appendCostLedger(sample);
// After:
if (sample?.sessionId) hive.appendCostLedger(sample);

This preserves the sample value for the circuit breaker check (inputs.push(...)) while preventing stale transcript replays (which have sessionId: '') from reaching the ledger.

Steps to reproduce

  1. Spawn an agent and let it work for a while (so transcript data accumulates on disk).
  2. End the agent's PTY session (close the terminal) but do NOT archive the agent.
  3. Leave the app running for 30+ minutes.
  4. Inspect cost-ledger.jsonl. Identical rows for the ended agent appear every 30 seconds.
  5. All rows show the same usd value and sessionId: ''.

Logs / screenshots

Example of the duplicate pattern in cost-ledger.jsonl (agent jim-mq2npyp5):
{"agentId":"jim-mq2npyp5","sessionId":"","usd":1053.52,"inputTokens":...,"ts":"2026-06-07T01:00:30Z"}
{"agentId":"jim-mq2npyp5","sessionId":"","usd":1053.52,"inputTokens":...,"ts":"2026-06-07T01:01:00Z"}
{"agentId":"jim-mq2npyp5","sessionId":"","usd":1053.52,"inputTokens":...,"ts":"2026-06-07T01:01:30Z"}
... (2,417 total identical rows over ~25 hours)

Workaround applied: Set archived: true on the affected agent in registry.json. The loop guard if (a.archived || a.isAssistant) continue stops the beat from processing it immediately. Purged existing ghost entries from cost-ledger.jsonl.

macOS version

Windows 11 (Azure Virtual Desktop, build 10.0.26200.8457)

Node version

v24.13.0

Claude Code version (if relevant)

v2.1.126

Pre-flight

  • I re-ran npm install so node-pty is rebuilt for the current Electron ABI.
  • I searched existing issues and this isn't a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions