Problem
archon-loop.sh's run_claude() has no retry mechanism. If claude -p crashes mid-session (API error, rate limit, network timeout), the call silently fails via || true and the iteration continues with missing output. This wastes the work the agent already completed before the crash.
Proposed Solution
Use Claude Code's --resume <session-id> + -p to resume crashed sessions instead of restarting from scratch.
Key design:
- Pre-assign session ID via
uuidgen + --session-id on first attempt
- Detect crash by checking if
session_end event exists in the JSONL log
- Resume with
--resume <session-id> + -p "Continue your task from where you left off" — Claude picks up from the interruption point with full conversation context
- Exponential backoff between retries (default: 30s → 60s → 120s)
- Cap retries at 2 (3 total attempts)
Verified:
--resume <session-id> works with -p mode (tested: session context is preserved)
session_id is available in stream-json output's result event
- Resume prompt + existing context is enough for Claude to continue without restarting
New CLI flags:
--max-retries N Max retries per claude -p call (default: 2)
--retry-backoff N Initial backoff seconds, doubles each retry (default: 30)
What this does NOT retry:
- Normal completion with poor results (agent gave up, wrong approach) — not a crash
--resume itself failing (session expired) — exhausts retry budget and returns error
Impact
All phases benefit (plan, prover, review) since retry is inside run_claude(). Parallel provers retry independently in their subshells.
Problem
archon-loop.sh'srun_claude()has no retry mechanism. Ifclaude -pcrashes mid-session (API error, rate limit, network timeout), the call silently fails via|| trueand the iteration continues with missing output. This wastes the work the agent already completed before the crash.Proposed Solution
Use Claude Code's
--resume <session-id>+-pto resume crashed sessions instead of restarting from scratch.Key design:
uuidgen+--session-idon first attemptsession_endevent exists in the JSONL log--resume <session-id>+-p "Continue your task from where you left off"— Claude picks up from the interruption point with full conversation contextVerified:
--resume <session-id>works with-pmode (tested: session context is preserved)session_idis available instream-jsonoutput'sresulteventNew CLI flags:
What this does NOT retry:
--resumeitself failing (session expired) — exhausts retry budget and returns errorImpact
All phases benefit (plan, prover, review) since retry is inside
run_claude(). Parallel provers retry independently in their subshells.