fix: exclude run-*-tests.sh aggregators from bash runner discovery (c4d6-61f7)

JoeOakhartNava · claude · JoeOakhartNava · commit 448c3a68e33e · 2026-03-26T08:35:24.000-07:00
The bash runner in test-batched.sh discovered run-*-tests.sh suite
aggregators alongside individual test-*.sh files. Since aggregators
internally run all test-*.sh files, the batched runner treated the entire
suite as a single test item that exceeded the time budget on every
invocation, preventing per-file resume from working.

Remove run-*-tests.sh from the find pattern in bash-runner.sh. Update
docs (CLAUDE.md, COMMIT-WORKFLOW.md) to remove the &gt;60s qualifier for
test-batched.sh usage — it should be the default test runner, not a
fallback for long suites.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.test-index b/.test-index
@@ -157,3 +157,5 @@ plugins/dso/scripts/acli-version-resolver.sh: tests/scripts/test-acli-version-re
 plugins/dso/scripts/gh-availability-check.sh: tests/scripts/test-gh-availability-check.sh
 plugins/dso/scripts/purge-non-project-tickets.sh: tests/scripts/test-purge-non-project-tickets.sh
 plugins/dso/scripts/sprint-next-batch.sh: tests/scripts/test-sprint-next-batch.sh
+plugins/dso/scripts/runners/bash-runner.sh: tests/scripts/test-bash-runner-discovery.sh
+plugins/dso/scripts/test-batched.sh: tests/scripts/test-batched-state-integrity.sh, tests/scripts/test-bash-runner-discovery.sh
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -127,7 +127,7 @@ These rules protect core structural boundaries. Violating them causes subtle bug
 9. **When fixing a bug, search for the same anti-pattern elsewhere.** After fixing a bug, search the codebase for other code that follows the same anti-pattern you just fixed. Create a bug ticket (`.claude/scripts/dso ticket create bug "<title>"`) for each occurrence found so they can be tracked and fixed systematically.
 10. **Write a failing test to verify your CI/staging bug hypothesis before fixing.** When diagnosing a CI or staging failure, write a unit or integration test that reproduces the suspected root cause FIRST. Run it to confirm it fails (RED). Only then implement the fix and verify the test passes (GREEN). This prevents fixing symptoms instead of causes and guards against the fix being wrong.
 11. **Always set `timeout: 600000` on Bash tool calls for commands expected to exceed 30 seconds, AND on all Bash calls during commit/review workflows.** Claude Code's hard timeout ceiling is ~73s even with max timeout. Without `timeout: 600000`, the ceiling drops to ~48s. Commands known to exceed 30s: `validate.sh --ci`, `make test`, `.claude/scripts/dso ticket sync`, ticket write commands in worktrees with many tickets. Additionally, set `timeout: 600000` on ALL Bash tool calls during COMMIT-WORKFLOW.md and REVIEW-WORKFLOW.md execution — even fast commands like `ruff check` can receive SIGURG (exit 144) from tool-call cancellation during internal event processing (see INC-016 scenario 4).
-12. **Use `test-batched.sh` for test commands expected to exceed 60 seconds.** The script supports runner drivers that decompose test suites into individual items for per-test resume. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` and `run-*-tests.sh` files and runs each as a separate item, enabling per-script resume on timeout. Example: `$(git rev-parse --show-toplevel)/plugins/dso/scripts/test-batched.sh --timeout=50 --runner=bash --test-dir=tests/scripts`. The generic fallback (`--timeout=50 "command"`) treats the entire command as a single item — use it only when no runner driver applies. Available runners: `bash` (test-*.sh files), `node` (*.test.js files), `pytest` (pytest collection). Run the printed `RUN:` command in subsequent Bash tool calls until the summary appears. Do NOT use `while` polling loops — they get killed by the ~73s tool timeout ceiling, producing spurious exit 144. For non-test long-running commands (e.g., `.claude/scripts/dso ticket sync`), see INC-016 in KNOWN-ISSUES.md for the managed launch/poll script pattern.
+12. **Use `test-batched.sh` for running tests.** The script supports runner drivers that decompose test suites into individual items for per-test resume. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` files and runs each as a separate item, enabling per-script resume on timeout. Example: `$(git rev-parse --show-toplevel)/plugins/dso/scripts/test-batched.sh --timeout=50 --runner=bash --test-dir=tests/scripts`. The generic fallback (`--timeout=50 "command"`) treats the entire command as a single item — use it only when no runner driver applies. Available runners: `bash` (test-*.sh files), `node` (*.test.js files), `pytest` (pytest collection). Run the printed `RUN:` command in subsequent Bash tool calls until the summary appears. Do NOT use `while` polling loops — they get killed by the ~73s tool timeout ceiling, producing spurious exit 144. For non-test long-running commands (e.g., `.claude/scripts/dso ticket sync`), see INC-016 in KNOWN-ISSUES.md for the managed launch/poll script pattern.
 
 ## Task Start Workflow
 
diff --git a/plugins/dso/docs/workflows/COMMIT-WORKFLOW.md b/plugins/dso/docs/workflows/COMMIT-WORKFLOW.md
@@ -91,30 +91,18 @@ Run unit tests to catch breakage before investing in review.
 
 > **Timeout rule**: Always set `timeout: 600000` on ALL Bash tool calls in this workflow — including fast commands like `ruff check`. Claude Code's hard ceiling is ~73s without the explicit timeout parameter (drops to ~48s), and even short commands can receive SIGURG (exit 144) during internal event processing. See CLAUDE.md rule 11 (Always Do These).
 
-> **Long-running test suites (>60s)**: If the project's test command is expected to exceed 60 seconds (e.g., `bash tests/run-all.sh`), wrap it with `test-batched.sh` instead of invoking it bare. A bare invocation will be killed by the ~73s tool timeout ceiling (exit 144), producing spurious failures. See CLAUDE.md rule 12 (Always Do These).
-
-Resolve the test command from config, then run it:
+Resolve the test command from config, then run it using `test-batched.sh` for per-test resume on timeout. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` files and runs each as a separate item, so completed tests are skipped on resume:
 
 ```bash
 REPO_ROOT=$(git rev-parse --show-toplevel)
-TEST_CMD="$(".claude/scripts/dso read-config.sh" commands.test_unit 2>/dev/null || echo "make test-unit-only")"
-```
-
-**If the test command is expected to complete in under 60s**, run it directly (with `timeout: 600000` on the Bash tool call):
-
-```bash
-cd app && $TEST_CMD 2>&1 | tail -5
-```
-
-**If the test command is expected to exceed 60s** (e.g., `bash tests/run-all.sh`), use `test-batched.sh` with a runner driver for per-test resume. **Prefer `--runner=bash --test-dir=<dir>` for bash test suites** — this discovers `test-*.sh` and `run-*-tests.sh` files and runs each as a separate item, so completed tests are skipped on resume:
-
-```bash
 .claude/scripts/dso test-batched.sh --timeout=50 --runner=bash --test-dir="$REPO_ROOT/tests/scripts"
 ```
 
-If no runner driver applies (the test command is not a directory of scripts), fall back to the generic runner which wraps the entire command as a single item (no sub-test resume):
+If no runner driver applies (the test command is not a directory of scripts), resolve the test command from config and fall back to the generic runner which wraps the entire command as a single item (no sub-test resume):
 
 ```bash
+REPO_ROOT=$(git rev-parse --show-toplevel)
+TEST_CMD="$(".claude/scripts/dso read-config.sh" commands.test_unit 2>/dev/null || echo "make test-unit-only")"
 .claude/scripts/dso test-batched.sh --timeout=50 "$TEST_CMD"
 ```
 
diff --git a/plugins/dso/scripts/runners/bash-runner.sh b/plugins/dso/scripts/runners/bash-runner.sh
@@ -24,13 +24,17 @@
 
 # _bash_discover_files <dir>
 # Prints one file path per line for test-*.sh files; returns non-zero if none found.
+# Excludes run-*-tests.sh aggregator scripts — these are suite orchestrators that
+# run all test-*.sh files internally. Including them causes the batched runner to
+# treat the entire suite as a single test item, which gets killed by the time budget
+# and prevents per-file resume from working.
 _bash_discover_files() {
     local dir="$1"
     local found=0
     # Use a while loop with sorted glob expansion for portability (no find -print0)
     while IFS= read -r f; do
         [ -f "$f" ] && [ -x "$f" ] && { echo "$f"; found=1; }
-    done < <(find "$dir" -maxdepth 1 \( -name 'test-*.sh' -o -name 'run-*-tests.sh' \) -print 2>/dev/null | sort)
+    done < <(find "$dir" -maxdepth 1 -name 'test-*.sh' -print 2>/dev/null | sort)
     [ "$found" -eq 1 ]
 }
 
@@ -48,7 +52,7 @@ if [ "$RUNNER" = "bash" ]; then
         done < <(_bash_discover_files "$TEST_DIR" 2>/dev/null || true)
 
         if [ "${#BASH_FILES[@]}" -eq 0 ]; then
-            echo "WARNING: --runner=bash: no test-*.sh or run-*-tests.sh files found under $TEST_DIR; falling back to generic runner." >&2
+            echo "WARNING: --runner=bash: no test-*.sh files found under $TEST_DIR; falling back to generic runner." >&2
         else
             USE_BASH_RUNNER=1
         fi
diff --git a/plugins/dso/scripts/test-batched.sh b/plugins/dso/scripts/test-batched.sh
@@ -28,9 +28,9 @@ set -uo pipefail
 #             files exist under --test-dir.
 #             Falls back to generic when: pytest not installed, no test files found,
 #             collection fails, or collection yields no test IDs.
-#   bash      Discovers test-*.sh and run-*-tests.sh files under --test-dir
+#   bash      Discovers test-*.sh files under --test-dir
 #             and runs each via: bash <file>
-#             Auto-detected when: test-*.sh or run-*-tests.sh files exist
+#             Auto-detected when: test-*.sh files exist
 #             under --test-dir (after node and pytest auto-detect).
 #             Falls back to generic when: no matching files found.
 #   generic   (default) Runs <command> as a single test item.
diff --git a/tests/scripts/test-bash-runner-discovery.sh b/tests/scripts/test-bash-runner-discovery.sh
@@ -0,0 +1,151 @@
+#!/usr/bin/env bash
+# tests/scripts/test-bash-runner-discovery.sh
+# Tests for plugins/dso/scripts/runners/bash-runner.sh discovery behavior.
+#
+# Verifies that the bash runner discovers test-*.sh files but excludes
+# run-*-tests.sh aggregator scripts (which are suite orchestrators, not
+# individual test items).
+#
+# Usage: bash tests/scripts/test-bash-runner-discovery.sh
+# Returns: exit 0 if all tests pass, exit 1 if any fail
+
+set -uo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PLUGIN_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+DSO_PLUGIN_DIR="$PLUGIN_ROOT/plugins/dso"
+REPO_ROOT="$(cd "$SCRIPT_DIR" && git rev-parse --show-toplevel)"
+BASH_RUNNER="$DSO_PLUGIN_DIR/scripts/runners/bash-runner.sh"
+TEST_BATCHED="$DSO_PLUGIN_DIR/scripts/test-batched.sh"
+
+source "$SCRIPT_DIR/../lib/run_test.sh"
+
+echo "=== test-bash-runner-discovery.sh ==="
+
+# ── Helpers ──────────────────────────────────────────────────────────────────
+
+_CLEANUP_DIRS=()
+_cleanup() { for d in "${_CLEANUP_DIRS[@]}"; do rm -rf "$d"; done; }
+trap _cleanup EXIT
+
+# ── Test 1: bash-runner.sh exists and is executable ──────────────────────────
+echo "Test 1: bash-runner.sh exists and is executable"
+if [ -f "$BASH_RUNNER" ] && [ -r "$BASH_RUNNER" ]; then
+    echo "  PASS: bash-runner.sh exists"
+    (( PASS++ ))
+else
+    echo "  FAIL: bash-runner.sh not found at $BASH_RUNNER" >&2
+    (( FAIL++ ))
+fi
+
+# ── Test 2: Discovery does NOT include run-*-tests.sh files ──────────────────
+echo "Test 2: test_discovery_excludes_aggregators — run-*-tests.sh not discovered"
+test_discovery_excludes_aggregators() {
+    local tmpdir
+    tmpdir=$(mktemp -d)
+    _CLEANUP_DIRS+=("$tmpdir")
+
+    # Create test-*.sh files (should be discovered)
+    cat > "$tmpdir/test-alpha.sh" << 'EOF'
+#!/usr/bin/env bash
+echo "PASSED: 1  FAILED: 0"
+exit 0
+EOF
+    chmod +x "$tmpdir/test-alpha.sh"
+
+    cat > "$tmpdir/test-beta.sh" << 'EOF'
+#!/usr/bin/env bash
+echo "PASSED: 1  FAILED: 0"
+exit 0
+EOF
+    chmod +x "$tmpdir/test-beta.sh"
+
+    # Create run-*-tests.sh aggregator (should NOT be discovered)
+    cat > "$tmpdir/run-all-tests.sh" << 'EOF'
+#!/usr/bin/env bash
+echo "I am an aggregator — I should not be run as an individual test item"
+exit 0
+EOF
+    chmod +x "$tmpdir/run-all-tests.sh"
+
+    # Run batched runner and check output for the aggregator filename
+    local output exit_code=0
+    output=$(TEST_BATCHED_STATE_FILE="$tmpdir/state.json" \
+        bash "$TEST_BATCHED" --timeout=30 --runner=bash --test-dir="$tmpdir" 2>&1) || exit_code=$?
+
+    # The aggregator must NOT appear in the run output
+    if echo "$output" | grep -q "run-all-tests.sh"; then
+        echo "  DEBUG: aggregator was discovered and run" >&2
+        return 1
+    fi
+
+    # test-alpha.sh and test-beta.sh must appear
+    echo "$output" | grep -q "test-alpha.sh" || return 1
+    echo "$output" | grep -q "test-beta.sh" || return 1
+}
+if test_discovery_excludes_aggregators; then
+    echo "  PASS: run-*-tests.sh excluded from discovery"
+    (( PASS++ ))
+else
+    echo "  FAIL: run-*-tests.sh was discovered as a test item" >&2
+    (( FAIL++ ))
+fi
+
+# ── Test 3: Discovery DOES include test-*.sh files ───────────────────────────
+echo "Test 3: test_discovery_includes_test_files — test-*.sh files discovered"
+test_discovery_includes_test_files() {
+    local tmpdir
+    tmpdir=$(mktemp -d)
+    _CLEANUP_DIRS+=("$tmpdir")
+
+    cat > "$tmpdir/test-gamma.sh" << 'EOF'
+#!/usr/bin/env bash
+echo "PASSED: 1  FAILED: 0"
+exit 0
+EOF
+    chmod +x "$tmpdir/test-gamma.sh"
+
+    local output exit_code=0
+    output=$(TEST_BATCHED_STATE_FILE="$tmpdir/state.json" \
+        bash "$TEST_BATCHED" --timeout=30 --runner=bash --test-dir="$tmpdir" 2>&1) || exit_code=$?
+
+    echo "$output" | grep -q "test-gamma.sh"
+}
+if test_discovery_includes_test_files; then
+    echo "  PASS: test-*.sh files discovered"
+    (( PASS++ ))
+else
+    echo "  FAIL: test-*.sh files not discovered" >&2
+    (( FAIL++ ))
+fi
+
+# ── Test 4: Warning message does not mention run-*-tests.sh ──────────────────
+echo "Test 4: test_warning_message_no_aggregator_mention — fallback warning updated"
+test_warning_message_no_aggregator_mention() {
+    # When no test files exist, the warning should only mention test-*.sh
+    local tmpdir
+    tmpdir=$(mktemp -d)
+    _CLEANUP_DIRS+=("$tmpdir")
+    mkdir -p "$tmpdir/empty-dir"
+
+    local output exit_code=0
+    output=$(TEST_BATCHED_STATE_FILE="$tmpdir/state.json" \
+        bash "$TEST_BATCHED" --timeout=30 --runner=bash --test-dir="$tmpdir/empty-dir" 2>&1) || exit_code=$?
+
+    # Warning should NOT mention run-*-tests.sh
+    if echo "$output" | grep -q "run-\*-tests.sh"; then
+        return 1
+    fi
+    return 0
+}
+if test_warning_message_no_aggregator_mention; then
+    echo "  PASS: warning message does not mention run-*-tests.sh"
+    (( PASS++ ))
+else
+    echo "  FAIL: warning message still mentions run-*-tests.sh" >&2
+    (( FAIL++ ))
+fi
+
+echo ""
+echo "Results: $PASS passed, $FAIL failed"
+[ "$FAIL" -eq 0 ]