Merge remote-tracking branch 'origin/main' into worktree-20260323-193735

JoeOakhartNava · JoeOakhartNava · commit 41b903433232 · 2026-03-24T13:20:01.000-07:00
diff --git a/.test-index b/.test-index
@@ -40,6 +40,7 @@ plugins/dso/scripts/ticket-show.sh:tests/scripts/test-ticket-show.sh,tests/scrip
 plugins/dso/scripts/ticket-create.sh:tests/scripts/test-ticket-create.sh [test_create_with_closed_parent_blocked]
 plugins/dso/scripts/ticket-link.sh:tests/scripts/test-ticket-link.sh [test_link_depends_on_closed_target_blocked]
 plugins/dso/scripts/ticket-transition.sh:tests/scripts/test-ticket-transition.sh [test_transition_bug_close_requires_reason]
+plugins/dso/scripts/runners/bash-runner.sh:tests/scripts/test-test-batched.sh
 plugins/dso/scripts/validate-phase.sh:tests/test-validate-phase-portability.sh
 plugins/dso/scripts/validate.sh:tests/plugin/test-validate-work-portability.sh,tests/hooks/test-validate-review-output.sh,tests/hooks/test-validate-crash-detection.sh,tests/scripts/test-validate-test-batched-integration.sh,tests/scripts/test-validate-flock-timeout.sh,tests/scripts/test-validate-background.sh,tests/scripts/test-validate-skip-ci-flag.sh,tests/scripts/test-validate-issues.sh,tests/scripts/test-validate-config.sh,tests/scripts/test-validate-script-writes-integration.sh,tests/scripts/test-validate-config-driven.sh,tests/scripts/test-validate-state-lifecycle.sh,tests/test-validate-phase-portability.sh
 plugins/dso/scripts/worktree-cleanup.sh:tests/scripts/test_worktree_cleanup_startup_config.py
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -125,7 +125,7 @@ These rules protect core structural boundaries. Violating them causes subtle bug
 9. **When fixing a bug, search for the same anti-pattern elsewhere.** After fixing a bug, search the codebase for other code that follows the same anti-pattern you just fixed. Create a bug ticket (`.claude/scripts/dso ticket create bug "<title>"`) for each occurrence found so they can be tracked and fixed systematically.
 10. **Write a failing test to verify your CI/staging bug hypothesis before fixing.** When diagnosing a CI or staging failure, write a unit or integration test that reproduces the suspected root cause FIRST. Run it to confirm it fails (RED). Only then implement the fix and verify the test passes (GREEN). This prevents fixing symptoms instead of causes and guards against the fix being wrong.
 11. **Always set `timeout: 600000` on Bash tool calls for commands expected to exceed 30 seconds, AND on all Bash calls during commit/review workflows.** Claude Code's hard timeout ceiling is ~73s even with max timeout. Without `timeout: 600000`, the ceiling drops to ~48s. Commands known to exceed 30s: `validate.sh --ci`, `make test`, `.claude/scripts/dso ticket sync`, `tk` write commands in worktrees with many tickets. Additionally, set `timeout: 600000` on ALL Bash tool calls during COMMIT-WORKFLOW.md and REVIEW-WORKFLOW.md execution — even fast commands like `ruff check` can receive SIGURG (exit 144) from tool-call cancellation during internal event processing (see INC-016 scenario 4).
-12. **Use `test-batched.sh` for test commands expected to exceed 60 seconds.** Example: `$(git rev-parse --show-toplevel)/plugins/dso/scripts/test-batched.sh --timeout=50 "plugins/dso/scripts/validate.sh --ci"`. The script runs the command in a time-bounded loop, saves progress to a state file, and prints a `NEXT:` resume command when the time limit is reached. Run the printed `NEXT:` command in subsequent Bash tool calls until the summary appears. Do NOT use `while` polling loops — they get killed by the ~73s tool timeout ceiling, producing spurious exit 144. For non-test long-running commands (e.g., `.claude/scripts/dso ticket sync`), see INC-016 in KNOWN-ISSUES.md for the managed launch/poll script pattern.
+12. **Use `test-batched.sh` for test commands expected to exceed 60 seconds.** The script supports runner drivers that decompose test suites into individual items for per-test resume. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` and `run-*-tests.sh` files and runs each as a separate item, enabling per-script resume on timeout. Example: `$(git rev-parse --show-toplevel)/plugins/dso/scripts/test-batched.sh --timeout=50 --runner=bash --test-dir=tests/scripts`. The generic fallback (`--timeout=50 "command"`) treats the entire command as a single item — use it only when no runner driver applies. Available runners: `bash` (test-*.sh files), `node` (*.test.js files), `pytest` (pytest collection). Run the printed `RUN:` command in subsequent Bash tool calls until the summary appears. Do NOT use `while` polling loops — they get killed by the ~73s tool timeout ceiling, producing spurious exit 144. For non-test long-running commands (e.g., `.claude/scripts/dso ticket sync`), see INC-016 in KNOWN-ISSUES.md for the managed launch/poll script pattern.
 
 ## Task Start Workflow
 
diff --git a/plugins/dso/docs/workflows/COMMIT-WORKFLOW.md b/plugins/dso/docs/workflows/COMMIT-WORKFLOW.md
@@ -106,7 +106,13 @@ TEST_CMD="$(".claude/scripts/dso read-config.sh" commands.test_unit 2>/dev/null
 cd app && $TEST_CMD 2>&1 | tail -5
 ```
 
-**If the test command is expected to exceed 60s** (e.g., `bash tests/run-all.sh`), wrap with `test-batched.sh`:
+**If the test command is expected to exceed 60s** (e.g., `bash tests/run-all.sh`), use `test-batched.sh` with a runner driver for per-test resume. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` and `run-*-tests.sh` files and runs each as a separate item, so completed tests are skipped on resume:
+
+```bash
+bash "$REPO_ROOT/plugins/dso/scripts/test-batched.sh" --timeout=50 --runner=bash --test-dir="$REPO_ROOT/tests/scripts"
+```
+
+If no runner driver applies (the test command is not a directory of scripts), fall back to the generic runner which wraps the entire command as a single item (no sub-test resume):
 
 ```bash
 bash "$REPO_ROOT/plugins/dso/scripts/test-batched.sh" --timeout=50 "$TEST_CMD"
diff --git a/plugins/dso/scripts/runners/bash-runner.sh b/plugins/dso/scripts/runners/bash-runner.sh
@@ -0,0 +1,199 @@
+#!/usr/bin/env bash
+# scripts/runners/bash-runner.sh — Bash test script runner driver
+#
+# Sourced by test-batched.sh to provide bash test script discovery and
+# execution. Discovers test-*.sh files under --test-dir and runs each
+# as a separate test item, enabling per-script resume on timeout.
+#
+# Requires these variables from the caller (test-batched.sh):
+#   RUNNER        — "bash" for explicit, "" for auto-detect
+#   TEST_DIR      — directory to search for test-*.sh files
+#   COMPLETED_LIST — array of already-completed test IDs (for resume)
+#   RESULTS_JSON  — JSON object of results so far
+#   STATE_FILE    — path to the JSON state file
+#   TIMEOUT       — timeout in seconds
+#   DEFAULT_TIMEOUT — default timeout value (for resume command construction)
+#   CMD           — fallback command (optional for bash runner)
+#
+# After sourcing, the caller checks USE_BASH_RUNNER and, if set, calls
+# _bash_runner_run to execute the bash runner path.
+#
+# Exports (set by this file):
+#   USE_BASH_RUNNER  — 1 if bash runner is active, 0 otherwise
+#   BASH_FILES       — array of discovered test script paths
+
+# _bash_discover_files <dir>
+# Prints one file path per line for test-*.sh files; returns non-zero if none found.
+_bash_discover_files() {
+    local dir="$1"
+    local found=0
+    # Use a while loop with sorted glob expansion for portability (no find -print0)
+    while IFS= read -r f; do
+        [ -f "$f" ] && [ -x "$f" ] && { echo "$f"; found=1; }
+    done < <(find "$dir" -maxdepth 1 \( -name 'test-*.sh' -o -name 'run-*-tests.sh' \) -print 2>/dev/null | sort)
+    [ "$found" -eq 1 ]
+}
+
+# Determine effective runner ──────────────────────────────────────────────────
+USE_BASH_RUNNER=0
+BASH_FILES=()
+
+if [ "$RUNNER" = "bash" ]; then
+    # Explicit --runner=bash: attempt bash driver; fall back on failures
+    if [ -z "$TEST_DIR" ]; then
+        echo "WARNING: --runner=bash requested but --test-dir not set; falling back to generic runner." >&2
+    else
+        while IFS= read -r f; do
+            BASH_FILES+=("$f")
+        done < <(_bash_discover_files "$TEST_DIR" 2>/dev/null || true)
+
+        if [ "${#BASH_FILES[@]}" -eq 0 ]; then
+            echo "WARNING: --runner=bash: no test-*.sh or run-*-tests.sh files found under $TEST_DIR; falling back to generic runner." >&2
+        else
+            USE_BASH_RUNNER=1
+        fi
+    fi
+elif [ -z "$RUNNER" ] && [ -n "$TEST_DIR" ]; then
+    # Auto-detect: activate bash driver when test-*.sh files exist under TEST_DIR
+    # Only auto-detect if node and pytest didn't already claim the runner
+    while IFS= read -r f; do
+        BASH_FILES+=("$f")
+    done < <(_bash_discover_files "$TEST_DIR" 2>/dev/null || true)
+
+    if [ "${#BASH_FILES[@]}" -gt 0 ]; then
+        USE_BASH_RUNNER=1
+        RUNNER="bash"
+    fi
+fi
+
+# _bash_runner_run
+# Executes the bash runner path. Called by test-batched.sh when USE_BASH_RUNNER=1.
+# Uses all shared state variables from the caller.
+_bash_runner_run() {
+    local TOTAL=${#BASH_FILES[@]}
+    local START_TIME
+    START_TIME=$(date +%s)
+    # Preserve created_at from existing state (if resuming), otherwise use now.
+    local SESSION_CREATED_AT="${_state_created_at:-$START_TIME}"
+    _elapsed() { echo $(( $(date +%s) - START_TIME )); }
+    local _bash_tmpdir
+    _bash_tmpdir=$(mktemp -d /tmp/test-batched-bash-XXXXXX)
+    local _existing_exit_trap
+    _existing_exit_trap=$(trap -p EXIT | sed "s/^trap -- '//;s/' EXIT$//")
+    if [ -n "$_existing_exit_trap" ]; then
+        trap 'rm -rf "$_bash_tmpdir"; '"$_existing_exit_trap" EXIT
+    else
+        trap 'rm -rf "$_bash_tmpdir"' EXIT
+    fi
+
+    _save_state_and_resume_bash() {
+        local completed_json results_json
+        completed_json=$(_completed_to_json)
+        results_json="$RESULTS_JSON"
+        _state_write "$STATE_FILE" "bash:${TEST_DIR}" "$completed_json" "$results_json" "" "$SESSION_CREATED_AT" 2>/dev/null || {
+            echo "WARNING: Could not write state file: $STATE_FILE" >&2
+        }
+        local done_count=${#COMPLETED_LIST[@]}
+        local resume_runner_arg="--runner=bash"
+        local resume_dir_arg="--test-dir=${TEST_DIR}"
+        local resume_timeout_arg=""
+        [ "$TIMEOUT" -ne "$DEFAULT_TIMEOUT" ] && resume_timeout_arg="--timeout=$TIMEOUT "
+        local resume_cmd="TEST_BATCHED_STATE_FILE=$STATE_FILE bash $0 ${resume_runner_arg} ${resume_dir_arg} ${resume_timeout_arg}${CMD:+"'$CMD'"}"
+        echo ""
+        echo "$done_count/$TOTAL tests completed."
+        echo ""
+        echo "════════════════════════════════════════════════════════════"
+        echo "  ⚠  ACTION REQUIRED — TESTS NOT COMPLETE  ⚠"
+        echo "════════════════════════════════════════════════════════════"
+        echo "RUN: $resume_cmd"
+        echo "DO NOT PROCEED until the command above prints a final summary."
+        echo "════════════════════════════════════════════════════════════"
+        exit 0
+    }
+
+    for bash_file in "${BASH_FILES[@]}"; do
+        # Use a path-relative test ID (relative to TEST_DIR) to avoid collisions
+        # when two files share the same basename (even though -maxdepth 1 currently
+        # prevents this, using a stable relative path makes the invariant explicit).
+        # Portable: strip the TEST_DIR prefix from the absolute path.
+        local test_id
+        local _abs_bash_file _abs_test_dir
+        _abs_bash_file="$(cd "$(dirname "$bash_file")" && pwd)/$(basename "$bash_file")"
+        _abs_test_dir="$(cd "$TEST_DIR" && pwd)"
+        test_id="${_abs_bash_file#"${_abs_test_dir}/"}"
+        # Fallback to basename if prefix stripping produced an empty or unchanged result
+        [ -z "$test_id" ] || [ "$test_id" = "$_abs_bash_file" ] && test_id="$(basename "$bash_file")"
+
+        if _is_completed "$test_id"; then
+            echo "Skipping (already completed): $test_id"
+            continue
+        fi
+
+        # Check timeout before running this file
+        if [ "$(_elapsed)" -ge "$TIMEOUT" ]; then
+            _save_state_and_resume_bash
+        fi
+
+        echo "Running: bash $bash_file"
+
+        # Launch the test script as a direct background child.  Exit-code capture
+        # uses `wait <pid>` — which is synchronous and race-free — instead of the
+        # previous approach of writing "$?" to a file from inside a subshell and then
+        # reading it from the parent (which could race with file-system buffering on
+        # busy or network-mounted filesystems).
+        local bash_exit=0
+        bash "$bash_file" &
+        local _test_bg_pid=$!
+
+        # Monitor: poll until the test finishes or the time budget runs out.
+        while kill -0 "$_test_bg_pid" 2>/dev/null; do
+            if [ "$(_elapsed)" -ge "$TIMEOUT" ]; then
+                kill "$_test_bg_pid" 2>/dev/null || true
+                wait "$_test_bg_pid" 2>/dev/null || true
+                COMPLETED_LIST+=("$test_id")
+                RESULTS_JSON=$(_results_add "$RESULTS_JSON" "$test_id" "interrupted")
+                _save_state_and_resume_bash
+            fi
+            sleep 0.1 2>/dev/null || sleep 1
+        done
+
+        # `wait` on a direct child always returns the child's actual exit code —
+        # no file-write race is possible here.
+        wait "$_test_bg_pid" 2>/dev/null; bash_exit=$?
+
+        local bash_outcome
+        if [ "$bash_exit" -eq 0 ]; then
+            bash_outcome="pass"
+        else
+            bash_outcome="fail"
+        fi
+
+        COMPLETED_LIST+=("$test_id")
+        RESULTS_JSON=$(_results_add "$RESULTS_JSON" "$test_id" "$bash_outcome")
+
+        local done_count=${#COMPLETED_LIST[@]}
+        echo "$done_count/$TOTAL tests completed."
+    done
+
+    # All bash files processed — print summary
+    local pass_count fail_count interrupted_count total_done
+    pass_count=$(_results_count "$RESULTS_JSON" "pass")
+    fail_count=$(_results_count "$RESULTS_JSON" "fail")
+    interrupted_count=$(_results_count "$RESULTS_JSON" "interrupted")
+    total_done=${#COMPLETED_LIST[@]}
+
+    echo ""
+    echo "All tests done. $total_done/$TOTAL tests completed. $pass_count passed, $fail_count failed, $interrupted_count interrupted."
+
+    if [ "$fail_count" -gt 0 ]; then
+        echo ""
+        echo "Failures:"
+        _results_failures "$RESULTS_JSON" | while IFS= read -r fid; do
+            echo "  FAIL: $fid"
+        done
+    fi
+
+    rm -f "$STATE_FILE"
+    # Interrupted tests are non-passing — exit non-zero if any tests failed or were interrupted
+    [ "$fail_count" -gt 0 ] || [ "$interrupted_count" -gt 0 ] && exit 1 || exit 0
+}
diff --git a/plugins/dso/scripts/test-batched.sh b/plugins/dso/scripts/test-batched.sh
@@ -28,10 +28,15 @@ set -uo pipefail
 #             files exist under --test-dir.
 #             Falls back to generic when: pytest not installed, no test files found,
 #             collection fails, or collection yields no test IDs.
+#   bash      Discovers test-*.sh and run-*-tests.sh files under --test-dir
+#             and runs each via: bash <file>
+#             Auto-detected when: test-*.sh or run-*-tests.sh files exist
+#             under --test-dir (after node and pytest auto-detect).
+#             Falls back to generic when: no matching files found.
 #   generic   (default) Runs <command> as a single test item.
 #
 # The <command> positional argument is required for the generic runner.
-# For the node runner, <command> is optional (used as fallback command).
+# For the node, pytest, and bash runners, <command> is optional (used as fallback).
 #
 # Output format:
 #   Between batches: progress line + Structured Action-Required Block (ACTION REQUIRED / RUN: / DO NOT PROCEED)
@@ -187,7 +192,7 @@ done
 
 # ── Validate required argument ─────────────────────────────────────────────────
 # CMD is required for generic runner; node and pytest runners can operate without it.
-if [ -z "$CMD" ] && [ "$RUNNER" != "node" ] && [ "$RUNNER" != "pytest" ]; then
+if [ -z "$CMD" ] && [ "$RUNNER" != "node" ] && [ "$RUNNER" != "pytest" ] && [ "$RUNNER" != "bash" ]; then
     echo "ERROR: Missing required argument: <command>" >&2
     echo ""
     sed -n '2,/^$/s/^# \{0,1\}//p' "$0" | head -60 >&2
@@ -402,6 +407,16 @@ if [ "$USE_PYTEST_RUNNER" -eq 1 ]; then
     _pytest_runner_run
 fi
 
+# ── Bash runner driver (sourced from runners/bash-runner.sh) ─────────────────
+# Sets USE_BASH_RUNNER and BASH_FILES; provides _bash_runner_run function.
+# shellcheck source=runners/bash-runner.sh
+source "$(dirname "$0")/runners/bash-runner.sh"
+
+# ── Bash runner execution path ───────────────────────────────────────────────
+if [ "$USE_BASH_RUNNER" -eq 1 ]; then
+    _bash_runner_run
+fi
+
 # ── Generic fallback runner ───────────────────────────────────────────────────
 # Runs CMD as a single test item with an auto-generated ID.
 # This is the default mode — a generic harness for any command.
diff --git a/plugins/dso/scripts/ticket-reducer.py b/plugins/dso/scripts/ticket-reducer.py
@@ -170,7 +170,7 @@ def reduce_ticket(
     """
     if strategy is None:
         strategy = LastTimestampWinsStrategy()
-    ticket_dir = str(ticket_dir_path)
+    ticket_dir = os.path.normpath(str(ticket_dir_path))
     ticket_id = os.path.basename(ticket_dir)
 
     # Compute content hash for caching (filename + file size to detect in-place
diff --git a/tests/scripts/test-test-batched.sh b/tests/scripts/test-test-batched.sh
diff --git a/tests/scripts/test_ticket_reducer.py b/tests/scripts/test_ticket_reducer.py