fix: add per-context subprocess timeout to prevent daemon freeze#38
Open
dmitryanchikov wants to merge 2 commits intomoazbuilds:masterfrom
Open
fix: add per-context subprocess timeout to prevent daemon freeze#38dmitryanchikov wants to merge 2 commits intomoazbuilds:masterfrom
dmitryanchikov wants to merge 2 commits intomoazbuilds:masterfrom
Conversation
When a claude subprocess hangs indefinitely (e.g. on a stuck network call), the serial queue blocks all subsequent messages/heartbeats/jobs. This was observed multiple times: 33-35 min hangs on simple Telegram messages and `claude billing` calls. Changes: - `runClaudeOnce()` accepts a `timeoutMs` parameter; on expiry sends SIGTERM then SIGKILL after a 5s grace period - `resolveTimeoutMs(name)` picks the timeout from `settings.timeouts` based on invocation context (telegram / heartbeat / everything else) - `TimeoutsConfig` added to `Settings` with hot-reload support — editing `settings.json` takes effect within the daemon's existing 30s reload cycle, no restart needed - Fallback model retry is skipped on timeout (only retries on rate limit) - Log entry and `.log` file record `[TIMED OUT]` for observability Default timeouts (all configurable in settings.json): telegram: 5 min heartbeat: 15 min job / other: 30 min Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a subprocess is killed by SIGTERM (143) or SIGKILL (137), the Telegram handler now sends a human-readable explanation rather than "Error (exit 143): Unknown error". This complements the runner.ts timeout handling: runner returns exitCode 0 with a friendly message when *our* timeout fires, but telegram.ts now also handles externally-killed processes (e.g. OOM, system signals) with a clear message instead of the raw exit code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fenrur
added a commit
to Fenrur/claudeclaw
that referenced
this pull request
Mar 17, 2026
…oazbuilds#38) - Configurable timeouts per context (telegram, heartbeat, job) - SIGTERM → SIGKILL grace period for stuck subprocesses - Timeout detection in Telegram error messages - Skip fallback retry on timeout (only on rate limit) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a
claudesubprocess hangs indefinitely (e.g. stuck network call,claude billingwaiting forever), the daemon's serial queue blocks all subsequent messages, heartbeats, and jobs. There is no timeout anywhere in the call chain.Observed incidents:
claude billinginside a bash chain hung 33+ minSolution
Add a configurable per-context timeout to
runClaudeOnce()with a SIGTERM → SIGKILL escalation sequence.src/runner.tsrunClaudeOnce()accepts atimeoutMsparameterResponse.text()andproc.exitedresolve naturally[TIMED OUT]for observabilitysrc/config.tsTimeoutsConfiginterface added toSettingsgetSettings()on every invocation — changes tosettings.jsontake effect within the daemon's existing 30s hot-reload cycle, no restart requiredDefault timeouts
telegramheartbeatAll configurable in
settings.json: