Skip to content

[CLAUDE ROUTINE]: Reliability enhancement — propagate the 10s custom-hook timeout to the hook itself via AbortSignal so a slow hook stops doing work the moment we stop waiting for it #269

@NiveditJain

Description

@NiveditJain

Summary

When a custom hook exceeds the 10-second timeout in handler.ts, we resolve the Promise.race, log a warning, and return allow() — which is the right user-facing behaviour. What we don't do is tell the hook that we've moved on, so the hook function keeps running in the background until either the Node process exits or the work it kicked off (HTTP requests, child processes, intervals) completes on its own. For most hooks this is invisible; for hooks that hold network sockets, file descriptors, or LLM API calls, it means we're paying for work whose result we'll never read.

Threading an AbortSignal through the PolicyContext lets well-written hooks bail cleanly the instant we time out, and costs nothing for hooks that ignore the signal. This is a friendly, opt-in upgrade — existing hooks keep working unchanged.

Where

src/hooks/handler.ts:122-145

const fn: PolicyFunction = async (ctx): Promise<PolicyResult> => {
  try {
    const result = await Promise.race([
      hook.fn(ctx),
      new Promise<PolicyResult>((_, reject) =>
        setTimeout(() => reject(new Error("timeout")), 10_000),
      ),
    ]);
    return result;
  } catch (err) {
    const msg = err instanceof Error ? err.message : String(err);
    const isTimeout = msg === "timeout";
    hookLogWarn(`${prefix} hook "${hookName}" failed: ${msg}`);
    // ...telemetry...
    return { decision: "allow" };
  }
};

The setTimeout here only races against the hook's promise — it never signals to the hook that the result is now being thrown away.

Why this matters

sequenceDiagram
    participant User as Claude Code
    participant Handler as handler.ts
    participant Hook as Custom hook fn
    participant API as Slow LLM / HTTP API

    User->>Handler: PreToolUse event
    Handler->>Hook: Invoke fn(ctx)
    Hook->>API: fetch(...) (no signal)
    Note over Handler: 10s timeout fires
    Handler->>Handler: Promise.race rejects with 'timeout'
    Handler-->>User: allow() (correct UX)
    Note over Hook,API: Hook's fetch keeps running ⏳
    API-->>Hook: Response 12s later
    Note over Hook: Result discarded but socket was open<br/>and tokens billed
Loading

Concrete cases where this hurts today:

  • Network-calling hooks. A custom policy that calls an internal LLM safety API or a CI status endpoint will keep that HTTP request open after we've moved on. With keepalive + no abort, sockets stay alive and the upstream service still does the work — wasting LLM tokens, hitting rate limits, and skewing latency dashboards because we record "12s" on the upstream side and "10s" on ours.
  • Hooks that spawned children. A hook that ran execFile() to lint something can leave a Bun / Node child process alive after timeout, since nothing kills it on the parent side.
  • Hooks holding intervals. A buggy hook that did setInterval(...) (rare, but real) will keep firing until process exit. For one-shot CLI invocations this is short-lived, but for long-running flows (Claude Agent SDK sessions, the relay daemon spawn lineage), it adds up.
  • Tests are harder to write deterministically. Without a signal, test authors who want to verify "my hook unwinds on timeout" have no contract to assert against.

This is also a natural pairing with #153 (named CUSTOM_HOOK_TIMEOUT_MS constant + env override): once the timeout is configurable, the hook really needs a way to react to it.

Proposed enhancement

  1. Extend PolicyContext with an optional signal: AbortSignal.
  2. In handler.ts, build the timeout from AbortSignal.timeout(CUSTOM_HOOK_TIMEOUT_MS) and pass it into the hook's ctx.
  3. Race the hook against the signal's abort — when the signal fires, the hook (if it cooperates) can short-circuit; we still return allow() regardless.

Sketch:

const controller = new AbortController();
const timer = setTimeout(() => controller.abort(new Error("timeout")), CUSTOM_HOOK_TIMEOUT_MS);
try {
  const ctxWithSignal = { ...ctx, signal: controller.signal };
  const result = await Promise.race([
    hook.fn(ctxWithSignal),
    new Promise<PolicyResult>((_, reject) =>
      controller.signal.addEventListener("abort", () => reject(controller.signal.reason)),
    ),
  ]);
  return result;
} finally {
  clearTimeout(timer);
}

For hook authors, this means they can write:

customPolicies.add({
  name: "slow-policy",
  match: { events: ["PreToolUse"] },
  fn: async (ctx) => {
    const resp = await fetch(URL, { signal: ctx.signal }); // <-- coop cancellation
    // ...
  },
});

Hooks that ignore ctx.signal keep working exactly as today — no behaviour change.

Acceptance criteria

  • PolicyContext exposes a signal: AbortSignal (typed in src/hooks/policy-types.ts).
  • On timeout, the signal is aborted with reason "timeout" before the handler returns.
  • clearTimeout runs on the success path so we don't leak timers when the hook resolves quickly.
  • User-facing decision is unchanged: timeout still produces decision: "allow" and a single hookLogWarn.
  • New unit test in __tests__/hooks/handler.test.ts verifies (a) the signal fires on timeout, (b) clearTimeout is called on success (no dangling timer in process._getActiveHandles()), (c) hooks that ignore the signal still get the same allow() outcome.
  • Docs: short example in README under "Custom policies" showing ctx.signal use with fetch.
  • CHANGELOG entry under ## Unreleased > Features.

(Plays well with #153 — once the timeout is configurable, ctx.signal becomes the corresponding hook-side handle.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions