Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
52c821c
fix(loop): build loop context regardless of session continuity mode (…
timothy-20 Feb 26, 2026
d10d581
Merge fix/session-continuity-loop-context: build loop context regardl…
timothy-20 Feb 26, 2026
78380cf
feat(analyzer): add question pattern detection for headless mode (#190)
timothy-20 Feb 27, 2026
9d75f9e
fix(circuit-breaker): suppress no-progress counter on question loops …
timothy-20 Feb 27, 2026
74e68f3
fix(loop): inject corrective guidance when previous loop asked questi…
timothy-20 Feb 27, 2026
1091450
test(#190): add question detection tests + fix arithmetic in detect_q…
timothy-20 Feb 27, 2026
1a536f0
docs: update CLAUDE.md for question detection feature (#190)
timothy-20 Feb 27, 2026
0ad3886
refactor(analyzer): remove redundant .json_parse_result re-read in qu…
timothy-20 Feb 27, 2026
8c0b52b
Merge branch 'fix/session-continuity-question-detection' into main
timothy-20 Feb 27, 2026
010acaf
fix(loop): separate stderr from live mode JSON stream to fix jq parsi…
timothy-20 Feb 27, 2026
9420e70
Merge branch 'fix/session-continuity-live-output'
timothy-20 Feb 27, 2026
f36cbf3
fix(loop): add version check and auto-update at startup (#190)
timothy-20 Feb 27, 2026
042fed8
refactor(loop): improve semver comparison and add auto-update config …
timothy-20 Feb 27, 2026
385c775
docs(loop): add environment-specific CLAUDE_AUTO_UPDATE guidance (#190)
timothy-20 Feb 27, 2026
30c0e48
refactor(tests): remove structural/duplicate tests and consolidate co…
timothy-20 Feb 27, 2026
561f0a1
Merge branch 'fix/startup-version-check'
timothy-20 Feb 27, 2026
6d98b3d
fix(tests): use bare mktemp -d instead of hardcoded /tmp paths
timothy-20 Feb 27, 2026
b29eb2f
Merge branch 'fix/test-tmpdir-hardcode'
timothy-20 Feb 27, 2026
80eff21
fix(tests): use UTC in get_past_timestamp to fix cooldown tests on no…
timothy-20 Feb 27, 2026
86e1d96
Merge branch 'fix/test-cooldown-timezone'
timothy-20 Feb 27, 2026
cd3d522
fix(loop): harden cleanup handling for signal, error, and crash exit …
timothy-20 Feb 27, 2026
044f94a
Merge branch 'fix/loop-cleanup-handling'
timothy-20 Feb 27, 2026
be6c96a
fix(loop): build loop context regardless of session continuity mode (…
timothy-20 Feb 26, 2026
7fb4ed7
feat(analyzer): add question pattern detection for headless mode (#190)
timothy-20 Feb 27, 2026
ffbc034
fix(circuit-breaker): suppress no-progress counter on question loops …
timothy-20 Feb 27, 2026
d69b652
fix(loop): inject corrective guidance when previous loop asked questi…
timothy-20 Feb 27, 2026
13d6ff3
test(#190): add question detection tests + fix arithmetic in detect_q…
timothy-20 Feb 27, 2026
00c6491
docs: update CLAUDE.md for question detection feature (#190)
timothy-20 Feb 27, 2026
3870665
refactor(analyzer): remove redundant .json_parse_result re-read in qu…
timothy-20 Feb 27, 2026
13b2413
fix(loop): separate stderr from live mode JSON stream to fix jq parsi…
timothy-20 Feb 27, 2026
170c530
fix(loop): add version check and auto-update at startup (#190)
timothy-20 Feb 27, 2026
4e18943
refactor(loop): improve semver comparison and add auto-update config …
timothy-20 Feb 27, 2026
ade55f5
docs(loop): add environment-specific CLAUDE_AUTO_UPDATE guidance (#190)
timothy-20 Feb 27, 2026
0492ea4
refactor(tests): remove structural/duplicate tests and consolidate co…
timothy-20 Feb 27, 2026
eb98ace
fix(loop): clean up empty stderr files after each live mode iteration…
timothy-20 Feb 27, 2026
a20e7b9
fix(analyzer,setup): guard detect_questions against set -e and chmod …
timothy-20 Feb 27, 2026
69f4792
fix(analyzer): relax ? requirement in detect_questions for declarativ…
timothy-20 Feb 27, 2026
85fc742
Merge branch 'fix/session-continuity-190' into main (#190)
timothy-20 Feb 28, 2026
9147fd9
fix(loop): guard bare function calls against set -e script terminatio…
timothy-20 Feb 28, 2026
27fd652
fix(loop,circuit-breaker): guard remaining bare calls against set -e …
timothy-20 Feb 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 27 additions & 12 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,14 @@ The system uses a modular architecture with reusable components in the `lib/` di
- Analyzes Claude Code output for completion signals
- **JSON output format detection and parsing** (with text fallback)
- Supports both flat JSON format and Claude CLI format (`result`, `sessionId`, `metadata`)
- Extracts structured fields: status, exit_signal, work_type, files_modified
- Extracts structured fields: status, exit_signal, work_type, files_modified, asking_questions, question_count
- **Question detection**: `detect_questions()` with `QUESTION_PATTERNS` array — detects when Claude asks questions instead of acting autonomously (Issue #190)
- **Session management**: `store_session_id()`, `get_last_session_id()`, `should_resume_session()`
- Automatic session persistence to `.ralph/.claude_session_id` file with 24-hour expiration
- Session lifecycle: `get_session_id()`, `reset_session()`, `log_session_transition()`, `init_session_tracking()`
- Session history tracked in `.ralph/.ralph_session_history` (last 50 transitions)
- Session auto-reset on: circuit breaker open, manual interrupt, project completion
- Detects test-only loops and stuck error patterns
- Detects test-only loops, stuck error patterns, and question-only loops
- Two-stage error filtering to eliminate false positives
- Multi-line error matching for accurate stuck loop detection
- Confidence scoring for exit decisions
Expand Down Expand Up @@ -176,7 +177,7 @@ tmux attach -t <session-name>

### Running Tests
```bash
# Run all tests (568 tests)
# Run all tests (554 tests)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test-count values are inconsistent in this doc.

Line 180 says 554, Line 537 says 559, and the table rows currently sum to a different value. Please normalize this to a single source of truth (or remove hard-coded totals).

📝 Suggested docs-stability tweak
-# Run all tests (554 tests)
+# Run all tests
 npm test
@@
-### Test Files (559 tests total)
+### Test Files

Also applies to: 537-559

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 180, The document contains inconsistent hard-coded test
counts around the "Run all tests" header and the test-results table; reconcile
them by choosing a single source of truth (either compute the total from the
table rows or remove the hard-coded total), then update the "Run all tests"
header value and every other occurrence of the total so they match the computed
sum of the table rows (or remove all totals so only individual row counts
remain); make sure the table rows themselves accurately sum to the chosen total
and update any related sentence that references the total.

npm test

# Run specific test suites
Expand Down Expand Up @@ -221,13 +222,23 @@ CLAUDE_OUTPUT_FORMAT="json" # Output format: json (default) or text
CLAUDE_ALLOWED_TOOLS="Write,Read,Edit,Bash(git add *),Bash(git commit *),...,Bash(npm *),Bash(pytest)" # Allowed tool permissions (see File Protection)
CLAUDE_USE_CONTINUE=true # Enable session continuity
CLAUDE_MIN_VERSION="2.0.76" # Minimum Claude CLI version
CLAUDE_AUTO_UPDATE=true # Auto-update Claude CLI at startup (set false for air-gapped environments)
```

**Auto-Update Configuration:**
- `CLAUDE_AUTO_UPDATE` controls whether Ralph checks npm registry and attempts `npm update -g` at startup
- **Local workstation / home server**: Keep `true` (default) — CLI updates include bug fixes and new features that improve Ralph's effectiveness. The 200-500ms startup overhead is negligible for loops that run hours
- **Docker container**: Set `false` in `.ralphrc` — container is ephemeral and version is pinned at image build time. The npm registry query and potential update are pure overhead
- **Air-gapped environment**: Set `false` — npm registry is unreachable, the check will timeout and log a warning
- Update failure is non-blocking: Ralph logs a warning and continues the loop normally

**Claude Code CLI Command (Issue #97):**
- `CLAUDE_CODE_CMD` defaults to `"claude"` (global install)
- Configurable via `.ralphrc` for alternative installations (e.g., `"npx @anthropic-ai/claude-code"`)
- Auto-detected during `ralph-enable` and `ralph-setup` (prefers `claude` if available, falls back to npx)
- Validated at startup with `validate_claude_command()` — displays clear error with installation instructions if not found
- After validation, `check_claude_version()` verifies minimum version compatibility and `check_claude_updates()` queries npm registry for latest version with auto-update attempt (Issue #190)
- Both functions use `compare_semver()` for proper major→minor→patch sequential comparison (safe for any patch number, unlike integer arithmetic)
- Environment variable `CLAUDE_CODE_CMD` takes precedence over `.ralphrc`

**CLI Options:**
Expand All @@ -241,6 +252,7 @@ Each loop iteration injects context via `build_loop_context()`:
- Remaining tasks from fix_plan.md
- Circuit breaker state (if not CLOSED)
- Previous loop work summary
- Corrective guidance if previous loop detected questions (Issue #190)

**Session Continuity:**
- Sessions are preserved in `.ralph/.claude_session_id`
Expand Down Expand Up @@ -408,6 +420,8 @@ When Claude Code exceeds `CLAUDE_TIMEOUT_MINUTES`, `portable_timeout` terminates

In both modes, a timeout results in `exit_code=124`, which flows into the standard error handling path (logged, circuit breaker updated, loop continues to next iteration).

**Stderr separation (Issue #190):** In live mode, Claude CLI stderr (e.g., Node.js UNDICI warnings) is redirected to a separate file (`claude_stderr_<timestamp>.log`) instead of being merged into the stdout JSON stream via `2>&1`. This prevents non-JSON stderr content from corrupting the `jq` pipeline, which previously caused jq exit code 4, empty `live.log`, and no terminal output. Background mode is unaffected (stderr goes directly to the log file). If stderr content is detected, a WARN is logged referencing the stderr file.

### API Limit Detection (Issue #183)

The API 5-hour limit detection uses a three-layer approach to avoid false positives. In stream-json mode, output files contain echoed file content from tool results (`"type":"user"` lines). If project files mention "5-hour limit", naive grep patterns match those echoed strings, incorrectly triggering the API limit recovery flow.
Expand All @@ -428,6 +442,7 @@ Only searches `tail -30` of the output file, filtering out `"type":"user"`, `"to
- `CB_SAME_ERROR_THRESHOLD=5` - Open circuit after 5 loops with repeated errors
- `CB_OUTPUT_DECLINE_THRESHOLD=70%` - Open circuit if output declines by >70%
- `CB_PERMISSION_DENIAL_THRESHOLD=2` - Open circuit after 2 loops with permission denials (Issue #101)
- **Question loop suppression** (Issue #190): When `asking_questions=true`, the `consecutive_no_progress` counter is held steady (not incremented). This prevents the circuit breaker from opening prematurely when Claude asks questions in headless mode. A corrective message is injected via `build_loop_context()` in the next loop iteration.

### Circuit Breaker Auto-Recovery (Issue #160)

Expand Down Expand Up @@ -519,17 +534,17 @@ Ralph uses a multi-layered strategy to prevent Claude from accidentally deleting

## Test Suite

### Test Files (568 tests total)
### Test Files (559 tests total)

| File | Tests | Description |
|------|-------|-------------|
| `test_circuit_breaker_recovery.bats` | 19 | Cooldown timer, auto-reset, parse_iso_to_epoch, CLI flag (Issue #160) |
| `test_circuit_breaker_recovery.bats` | 20 | Cooldown timer, auto-reset, parse_iso_to_epoch, CLI flag (Issue #160) + log_circuit_transition jq resilience |
| `test_cli_parsing.bats` | 35 | CLI argument parsing for all flags + monitor parameter forwarding |
| `test_cli_modern.bats` | 68 | Modern CLI commands (Phase 1.1) + build_claude_command fix + live mode text format fix (#164) + errexit pipeline guard (#175) + ALLOWED_TOOLS tightening (#149) + API limit false positive detection (#183) + Claude CLI command validation (#97) + stale call counter fix (#196) |
| `test_json_parsing.bats` | 52 | JSON output format parsing + Claude CLI format + session management + array format |
| `test_session_continuity.bats` | 44 | Session lifecycle management + expiration + circuit breaker integration + issue #91 fix |
| `test_exit_detection.bats` | 53 | Exit signal detection + EXIT_SIGNAL-based completion indicators + progress detection |
| `test_rate_limiting.bats` | 15 | Rate limiting behavior |
| `test_cli_modern.bats` | 92 | Modern CLI commands (Phase 1.1) + build_claude_command fix + live mode text format fix (#164) + errexit pipeline guard (#175) + ALLOWED_TOOLS tightening (#149) + API limit false positive detection (#183) + Claude CLI command validation (#97) + stale call counter fix (#196) + question detection corrective message (#190) + stderr separation (#190) + version check and auto-update at startup (#190) + semver comparison (#190) + set -e bare call safety (#190) |
| `test_json_parsing.bats` | 50 | JSON output format parsing + Claude CLI format + array format + question detection (#190) |
| `test_session_continuity.bats` | 26 | Session lifecycle management + expiration + circuit breaker integration + issue #91 fix |
| `test_exit_detection.bats` | 50 | Exit signal detection + EXIT_SIGNAL-based completion indicators + progress detection + question detection integration (#190) |
| `test_rate_limiting.bats` | 11 | Rate limiting behavior |
| `test_loop_execution.bats` | 20 | Integration tests |
| `test_edge_cases.bats` | 25 | Edge case handling |
| `test_installation.bats` | 15 | Global installation/uninstall workflows + dotfile template copying (#174) |
Expand All @@ -539,8 +554,8 @@ Ralph uses a multi-layered strategy to prevent Claude from accidentally deleting
| `test_task_sources.bats` | 23 | Task sources (beads, GitHub, PRD extraction, normalization) |
| `test_ralph_enable.bats` | 24 | Ralph enable integration tests (wizard, CI version, JSON output, .ralphrc validation #149) |
| `test_wizard_utils.bats` | 20 | Wizard utility functions (stdout/stderr separation, prompt functions) |
| `test_file_protection.bats` | 22 | File integrity validation (RALPH_REQUIRED_PATHS, validate_ralph_integrity, get_integrity_report) (Issue #149) |
| `test_integrity_check.bats` | 12 | Pre-loop integrity check in ralph_loop.sh (startup + in-loop validation) (Issue #149) |
| `test_file_protection.bats` | 15 | File integrity validation (RALPH_REQUIRED_PATHS, validate_ralph_integrity, get_integrity_report) (Issue #149) |
| `test_integrity_check.bats` | 10 | Pre-loop integrity check in ralph_loop.sh (startup + in-loop validation) (Issue #149) |

### Running Tests
```bash
Expand Down
22 changes: 20 additions & 2 deletions lib/circuit_breaker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,12 @@ record_loop_result() {
consecutive_permission_denials=0
fi

# Check if Claude is asking questions (Issue #190 Bug 2)
local asking_questions="false"
if [[ -f "$response_analysis_file" ]]; then
asking_questions=$(jq -r '.analysis.asking_questions // false' "$response_analysis_file" 2>/dev/null || echo "false")
fi

# Determine if progress was made
if [[ $files_changed -gt 0 ]]; then
# Git shows uncommitted changes - clear progress
Expand All @@ -223,6 +229,10 @@ record_loop_result() {
has_progress=true
consecutive_no_progress=0
last_progress_loop=$loop_number
elif [[ "$asking_questions" == "true" ]]; then
# Claude is asking questions — not progress, but not stagnation either.
# Suppress no-progress counter; corrective context will redirect next loop.
has_progress=false
else
consecutive_no_progress=$((consecutive_no_progress + 1))
fi
Expand Down Expand Up @@ -340,8 +350,16 @@ log_circuit_transition() {
\"reason\": \"$reason\"
}"

history=$(echo "$history" | jq ". += [$transition]")
echo "$history" > "$CB_HISTORY_FILE"
local updated_history
updated_history=$(echo "$history" | jq ". += [$transition]" 2>/dev/null)
local jq_status=$?

if [[ $jq_status -eq 0 && -n "$updated_history" ]]; then
echo "$updated_history" > "$CB_HISTORY_FILE"
else
# Fallback: preserve current transition only (history file was corrupted)
echo "[$transition]" > "$CB_HISTORY_FILE"
fi

# Console log with colors
case $to_state in
Expand Down
3 changes: 3 additions & 0 deletions lib/enable_core.sh
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,9 @@ BEADS_FILTER="status:open"
CB_NO_PROGRESS_THRESHOLD=3
CB_SAME_ERROR_THRESHOLD=5
CB_OUTPUT_DECLINE_THRESHOLD=70

# Auto-update Claude CLI at startup
CLAUDE_AUTO_UPDATE=true
RALPHRCEOF
}

Expand Down
55 changes: 53 additions & 2 deletions lib/response_analyzer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,33 @@ RALPH_DIR="${RALPH_DIR:-.ralph}"
COMPLETION_KEYWORDS=("done" "complete" "finished" "all tasks complete" "project complete" "ready for review")
TEST_ONLY_PATTERNS=("npm test" "bats" "pytest" "jest" "cargo test" "go test" "running tests")
NO_WORK_PATTERNS=("nothing to do" "no changes" "already implemented" "up to date")
QUESTION_PATTERNS=("should I" "would you" "do you want" "which approach" "which option" "how should" "what should" "shall I" "do you prefer" "can you clarify" "could you" "what do you think" "please confirm" "need clarification" "awaiting.*input" "waiting.*response" "your preference")

# Detect if Claude is asking questions instead of acting autonomously
# Args: $1 = text content to analyze
# Returns: 0 if questions detected, 1 otherwise
# Outputs: question count on stdout
detect_questions() {
local content="$1"
local question_count=0

if [[ -z "$content" ]]; then
echo "0"
return 1
fi

# Count lines matching question patterns (case-insensitive)
for pattern in "${QUESTION_PATTERNS[@]}"; do
local matches
matches=$(echo "$content" | grep -ciw "$pattern" 2>/dev/null || echo "0")
matches=$(echo "$matches" | tr -d '[:space:]')
matches=${matches:-0}
question_count=$((question_count + matches))
done

echo "$question_count"
[[ $question_count -gt 0 ]] && return 0 || return 1
}

# =============================================================================
# JSON OUTPUT FORMAT DETECTION AND PARSING
Expand Down Expand Up @@ -359,6 +386,13 @@ analyze_response() {
confidence_score=$((json_confidence + 50))
fi

# Detect questions in JSON response text (Issue #190 Bug 2)
local asking_questions=false
local question_count=0
if question_count=$(detect_questions "$work_summary"); then
asking_questions=true
fi

# Check for file changes via git (supplements JSON data)
# Fix #141: Detect both uncommitted changes AND committed changes
if command -v git &>/dev/null && git rev-parse --git-dir >/dev/null 2>&1; then
Expand Down Expand Up @@ -415,6 +449,8 @@ analyze_response() {
--argjson has_permission_denials "$has_permission_denials" \
--argjson permission_denial_count "$permission_denial_count" \
--argjson denied_commands "$denied_commands_json" \
--argjson asking_questions "$asking_questions" \
--argjson question_count "$question_count" \
'{
loop_number: $loop_number,
timestamp: $timestamp,
Expand All @@ -432,7 +468,9 @@ analyze_response() {
output_length: $output_length,
has_permission_denials: $has_permission_denials,
permission_denial_count: $permission_denial_count,
denied_commands: $denied_commands
denied_commands: $denied_commands,
asking_questions: $asking_questions,
question_count: $question_count
}
}' > "$analysis_result_file"
rm -f "$RALPH_DIR/.json_parse_result"
Expand Down Expand Up @@ -530,6 +568,14 @@ analyze_response() {
fi
done

# 5.5. Detect question patterns (Claude asking instead of acting) (Issue #190 Bug 2)
local asking_questions=false
local question_count=0
if question_count=$(detect_questions "$output_content"); then
asking_questions=true
work_summary="Claude is asking questions instead of acting autonomously"
fi

# 6. Check for file changes (git integration)
# Fix #141: Detect both uncommitted changes AND committed changes
if command -v git &>/dev/null && git rev-parse --git-dir >/dev/null 2>&1; then
Expand Down Expand Up @@ -613,6 +659,8 @@ analyze_response() {
--argjson exit_signal "$exit_signal" \
--arg work_summary "$work_summary" \
--argjson output_length "$output_length" \
--argjson asking_questions "$asking_questions" \
--argjson question_count "$question_count" \
'{
loop_number: $loop_number,
timestamp: $timestamp,
Expand All @@ -630,7 +678,9 @@ analyze_response() {
output_length: $output_length,
has_permission_denials: false,
permission_denial_count: 0,
denied_commands: []
denied_commands: [],
asking_questions: $asking_questions,
question_count: $question_count
}
}' > "$analysis_result_file"

Expand Down Expand Up @@ -879,6 +929,7 @@ export -f analyze_response
export -f update_exit_signals
export -f log_analysis_summary
export -f detect_stuck_loop
export -f detect_questions
export -f store_session_id
export -f get_last_session_id
export -f should_resume_session
Loading
Loading