Skip to content

[CLAUDE ROUTINE]: Reliability enhancement — make the 50 MB cap on appendToServerQueue race-safe so a burst of concurrent hooks can't slip past it #273

@NiveditJain

Description

@NiveditJain

Summary

appendToServerQueue keeps ~/.failproofai/cache/server-queue/pending.jsonl from growing without bound by short-circuiting once the file exceeds 50 MB. The check works perfectly when one hook process is writing at a time, which is the common case. When several hook processes fire at once — multi-CLI users, parallel tool calls in a single session, or a busy CI agent — they can all read the size in the same window, all conclude "still under the cap," and all append. The end result is a queue file that is somewhat larger than MAX_QUEUE_BYTES. Tightening this is a small, friendly hardening that keeps the cap a hard ceiling rather than a soft one.

Where

src/relay/queue.ts:118-139

export function appendToServerQueue(entry: RawEntry): void {
  if (!isLoggedIn()) return;
  ensureDir();

  try {
    if (existsSync(PENDING_FILE) && statSync(PENDING_FILE).size > MAX_QUEUE_BYTES) {
      return;   // ← TOCTOU window: size is checked, then released
    }
  } catch { /* races are fine; proceed */ }

  const sanitized = sanitize(entry);
  appendFileSync(PENDING_FILE, JSON.stringify(sanitized) + "\n", { mode: 0o600 });
  // ...
}

The appendFileSync itself is line-atomic (O_APPEND), so we never get torn lines — that part is already great. What's not atomic is the size-vs-cap decision relative to the append.

Why this matters

sequenceDiagram
    participant H1 as Hook proc A
    participant H2 as Hook proc B
    participant H3 as Hook proc C
    participant FS as pending.jsonl<br/>(49.99 MB)

    par Concurrent hook fires
        H1->>FS: statSync → 49.99 MB ✓ under cap
    and
        H2->>FS: statSync → 49.99 MB ✓ under cap
    and
        H3->>FS: statSync → 49.99 MB ✓ under cap
    end
    par All three append
        H1->>FS: appendFileSync (line N)
        H2->>FS: appendFileSync (line N+1)
        H3->>FS: appendFileSync (line N+2)
    end
    Note over FS: file now > 50 MB —<br/>cap was a soft ceiling, not a hard one
Loading

Today this is a nuisance more than a bug — the file ends up a few KB / MB larger than the cap, depending on burst size, and the daemon eventually drains it. But:

  • It undermines the reliability promise the constant name implies (MAX_QUEUE_BYTES).
  • For a logged-out-but-active CLI user, the cap is the only thing standing between us and unbounded local disk growth.
  • Multi-CLI is a recently-shipped feature (Codex / Copilot / Cursor), so concurrent hook fires are increasingly common.

Suggested approach

There are a few light-touch options, in increasing order of strictness. The simplest one is enough:

Option A — Re-check after append (cheap, restores the invariant lazily). After appendFileSync, statSync again; if the new size exceeds MAX_QUEUE_BYTES, do a best-effort log line so we have observability that the cap fired under load. This doesn't drop the over-the-cap append, but it surfaces the case for tuning.

Option B — Fold the check into the existing daemon-side rotate (preferred). claimPendingBatch already does an atomic rename of pending.jsonl into a processing-*.jsonl file. Trigger an early claimPendingBatch from the hook side once pending.jsonl crosses, say, MAX_QUEUE_BYTES * 0.9 — the rename leaves a fresh empty pending.jsonl and the new append lands there. No change to the cap, but the cap stops being a soft drop point and becomes a soft rotation trigger.

Option C — Use the existing advisory-lock pattern from hook-activity-store.ts (acquireLock / releaseLock at lines 85-110). Wrap the size-check + append in the same exclusive-create-with-staleness lock so concurrent appenders serialize. This is the strictest fix but introduces a hot-path lock in code that is currently lock-free.

Option B is probably the friendliest — it keeps the hook fast path lock-free, removes the silent drop entirely, and reuses code we already trust.

Customer impact

  • Today (no change): A burst of N concurrent hooks at the cap can each append one entry, overshooting by up to N entries (a few KB of disk, no functional impact).
  • After fix: The cap behaves like a hard ceiling regardless of how many hooks fire concurrently. Disk usage is genuinely bounded.
  • Zero impact on the hook hot path — adds at most one extra statSync.

Acceptance criteria

  • Concurrent hook bursts at the cap leave pending.jsonlMAX_QUEUE_BYTES (modulo one in-flight append).
  • Hook latency unchanged in the common (non-bursting) case.
  • Unit test in __tests__/relay/queue.test.ts (or new file) that simulates concurrent appends at the cap.
  • No new persistent state — same on-disk shape, same recovery behaviour.

https://claude.ai/code/session_01SAwaAnE9bTuLujnvksoCYN

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions