staged: story/588e-abec-221f-42c0/3ebb-resilience-dd123-6 -> staged-793b29ce7031-1780709220#684
Conversation
3ebb DD1 (resilience-harden, part 1): codify the review-gate decidability lattice PASS/FAIL/INDETERMINATE and retrofit the first named Goal-1 gate. - review-tristate-lib.sh: lattice helper (generalizes precondition-gate.sh exit-78). tristate_classify_verdict (unreviewed>0 -> FAIL/1 safe bottom; errors-only -> INDETERMINATE/75; clean -> PASS/0), tristate_is_transient_error (5xx/429/rate-limit/timeout/reset/gateway only), tristate_code_to_name. - review-tristate-lattice.md: review-specific contract + security guardrail. - review-coverage-invariant.sh: separates could-not-confirm (INDETERMINATE/75) from a genuine unreviewed SHA (FAIL/1); both still block under enforce. TDD: test-review-tristate-lib.sh 20/20 (new); coverage-invariant T4 -> 75, T4b violation-dominates -> 1; harness hermetic. Refs 3ebb-fa93-f980-40de DD1. DSO-Story: 3ebb-fa93-f980-40de
…arker-absent Completes 3ebb DD1: retrofit the second named Goal-1 gate to the tristate lattice. The marker-absent path (provenance verifier crashed / never ran) previously exited 1 — conflated with the normal 'dispatch runner' exit 1. It is genuinely 'could not COMPUTE the dispatch decision' -> INDETERMINATE (exit 75): still blocks (never the 8a77 silent-skip exit 0), but a distinct signal the orchestrator may retry on a transient cause / route to escalation (DD3). Sources review-tristate-lib.sh script-relative (hermetic under test). TDD: test-llm-review-dispatch-or-skip.sh Test 9 updated (marker-absent -> 75); 43/43 green. Refs 3ebb-fa93-f980-40de DD1. DSO-Story: 3ebb-fa93-f980-40de
… never authoritative-empty 3ebb DD2 (durable --resume): treat a GitHub PR-state read as authoritative only when the call succeeded AND returned well-formed JSON. A non-zero rc, an empty payload, or truncated/malformed JSON (a page cut short) is INDETERMINATE. - resume_pr_query_trustworthy (pure, INC-008 guard): rc 0 + valid JSON (incl. a legitimately-empty '[]') -> trustworthy; otherwise -> 75 (INDETERMINATE). The empty-but-valid '[]' case is deliberately distinguished from a truncated read. - advance-to-PR2 wiring: the PR1-state read now routes through the guard. On an indeterminate read it re-fetches once, then HALTS as INDETERMINATE (exit 75) rather than GUESS a direction — collapsing partial state into '0 PRs' would advance onto unverified state (skip review) and into '>0' would mint a duplicate staged-*+PR1. Neither is safe to guess (INC-008 / INC-015 stall, hit twice this session). TDD: test-merge-to-main-resume-inc008.sh 8/8 (new); resume regressions green (staged-advance 12/12, existing-pr-discovery 6/6). No new shellcheck findings. Refs 3ebb-fa93-f980-40de DD2. DSO-Story: 3ebb-fa93-f980-40de
…dicts 3ebb DD3: an INDETERMINATE verdict from any gate routes to /dso:fp-recovery via a single standardized, greppable (telemetry-ready) marker — never manual git surgery. Last-resort by design: fires only after transient retries are spent; a genuine FAIL never escalates here. - tristate_indeterminate_escalation (review-tristate-lib.sh): emits the INDETERMINATE_ESCALATION JSON marker + an actionable banner; JSON-safe reason. - wired at all three INDETERMINATE sites: review-coverage-invariant.sh (INDETERMINATE-enforce exit), llm-review-dispatch-or-skip.sh (marker-absent), merge-to-main-pr.sh (INC-008 resume halt, after one re-fetch). - merge-to-main-pr.sh now sources review-tristate-lib.sh (script-relative). - contract doc documents the escalation surface + the anti-friction stance (escalation rate is a defect signal, not normal operation). TDD: test-review-tristate-lib.sh 23/23 (T8/T8b/T8c added); coverage-invariant 14/14, dispatcher 43/43, inc008 8/8, staged-advance 12/12 — all green. Refs 3ebb-fa93-f980-40de DD3. DSO-Story: 3ebb-fa93-f980-40de
…/tmp is a cache) Completes 3ebb DD2: treat /tmp/merge-to-main-state-*.json as a CACHE, not the source of truth (GitHub + git are), and kill the INC-015 stale-checkout stall. - resume_staged_ref_is_spent (pure): a staged-* ref with 0 commits ahead of origin/main is SPENT (already merged / empty). Fail-safe: unknown count is NOT spent (never skip/GC on uncertainty). - resume_gc_stale_staged_state: GC /tmp cache entries for staged-* branches that are GONE from origin or SPENT, so a later --resume never reconstructs off a completed promotion's stale state. - current-branch assertion (INC-015 (a)): before the PR1/PR2 phase, if the worktree sits on a SPENT staged-* ref, emit a concrete recovery (checkout your feature branch) and exit 1 + GC the stale cache — instead of the confusing 'No commits between main and staged-*' stall. Deliberately NOT routed to /dso:fp-recovery: a wrong-branch checkout is a DETERMINATE operational error, not an INDETERMINATE verdict (anti-over-reliance, per the friction review). TDD: test-merge-to-main-resume-inc008.sh 17/17 (predicate T8-T11 + GC T12-T14); resume regressions green (staged-advance 12, resume-and-stale 7, cwd 7, pr2-statefile 2, existing-pr-discovery 6, freshness 6). No new shellcheck. Refs 3ebb-fa93-f980-40de DD2. DSO-Story: 3ebb-fa93-f980-40de
CI ShellCheck (warning threshold) flagged SC2120 — resume_gc_stale_staged_state references $1 but production callers use the /tmp default; only the test passes an explicit dir. Add a scoped shellcheck disable documenting the intentional optional arg. Comment-only; no behavior change. DSO-Story: 3ebb-fa93-f980-40de
…branch fix) CI Script Tests failed T14_gc_keeps_live: the fixture used 'git branch staged-spent main' / 'checkout -b staged-live main', but CI's init.defaultBranch=master means no local 'main' existed -> 'src refspec does not match any' -> branches never created -> GC removed their (gone) cache. Push HEAD directly to named remote refs instead; no local-branch-name dependency. Verified 17/17 both normally and with init.defaultBranch=master. Production GC code unchanged (it was correct). DSO-Story: 3ebb-fa93-f980-40de
…nate conflict recovery cca8-DD1 follow-on (vetted via the epic 3-lens panel + convergence reviewer). When origin/main advanced mid-PR2, _phase_merge rebased the staged-* ref and force-pushed it; the staged-* ruleset requires review-sub-pr on pushes (a check that only runs on pull_request events, never on a push), so the force-push was structurally un-satisfiable -> GH013 rejection -> pipeline died with a misleading CONFLICT_DATA. - _branch_is_staged_promotion: single-source-of-truth predicate; folds the 4 scattered '[[ $BRANCH == staged-* ]]' literals (N1) so PR2-phase semantics cannot drift. - R-A: in _phase_merge, SKIP the entire publish block (sync + source-branch version-bump + push) when on a staged-* promotion. staged was already created/pushed/reviewed by PR1; GitHub PR2 rebase-merge replays it onto current main at merge time. Prevents the GH013 from relocating to the plain-push path. - R-C: a true PR2 CONFLICTING is a DETERMINATE operational error -> concrete re-stage RECOVERY + exit 1, NOT routed to /dso:fp-recovery (which cannot rebase staged); poll until mergeable is computed first (N2). - BEHIND-base refresh is intentionally NOT duplicated here — the existing auto-merge poll loop already calls gh pr update-branch (jira-dig-2529). TDD: test-merge-to-main-resume-inc008.sh +6 predicate cases (both polarities, 23/23); linear-sync 6/6, staged-advance 12/12, resume-and-stale 7/7 green. shellcheck -S warning clean. Refs 3ebb-fa93-f980-40de; origin cca8-f1ed-ddd3-48f1 DD1. DSO-Story: 3ebb-fa93-f980-40de
…imitations Document the two race classes the two-tier-over-merge-queue choice does NOT serialize away (consequences of declining a merge queue, mitigated by quiescing): - L1: concurrent PR2s can be individually-green but semantically conflict on main (the merge-queue serialization this ADR traded away). Mitigation: one main-bound promotion at a time. - L2: the 3ee4 enforce-flip races in-flight PR2s during the live<->file skew window. Mitigation: runbook quiesce (no PR2 in flight during the swap). Both are correctness/operational limits, not integrity regressions (no unreviewed content can reach main). Surfaced per a workflow-resilience review so they are not rediscovered as surprises; the 3ebb DD3 drift-reconciler stretch goal is the place to revisit if the concurrency cost becomes material. DSO-Story: 3ebb-fa93-f980-40de
CI ShellCheck (warning) flagged SC2120 — _branch_is_staged_promotion references $1 but production callers use the $BRANCH default; only the test passes an explicit arg. Add the scoped disable (same pattern as resume_gc_stale_staged_state). Comment-only; verified SC2120=0 at default severity (CI-equivalent); remaining 8 findings are pre-existing SC2004/SC2006 style (below CI's failure threshold). DSO-Story: 3ebb-fa93-f980-40de
DSO-Story: 3ebb-fa93-f980-40de
- F3 (real): add a cleanup trap for the inc008 GC temp dir so /tmp is not left stale on early exit/interrupt. - F4 (real): resolve the tristate test's lib path via $BASH_SOURCE (script- relative) instead of 'git rev-parse --show-toplevel', robust under worktrees. - F1 (relaxed): T4/T4b assert the behavioral exit code (75=INDETERMINATE, 1=FAIL-dominates) + the offending SHA's presence, dropping the prose verdict word to reduce output-string coupling. F2 (dispatcher marker assertion) defended: it asserts exit 75 + the stable 'provenance-complete.marker' filename (a contract path, not prose) — kept. Tests green: tristate 23/23, inc008 23/23, coverage-invariant 14/14. DSO-Story: 3ebb-fa93-f980-40de
…er, GC fail-safe, poll sleep) Three real findings from review-sub-pr on PR #676 (production code): - tristate_is_transient_error: the 5xx/429 globs (*50[0-9]*, *429*) matched a status code EMBEDDED in a larger number (e.g. '504800 bytes') -> could flip a non-transient INDETERMINATE into a retry, violating lattice rule 4. Anchor the numeric match on non-digit boundaries (grep -E '(^|[^0-9])(5[0-9]{2}|429)([^0-9]|$)'). +4 edge-case tests (byte counts / offsets / SHA fragments stay non-transient); 27/27. - resume_gc_stale_staged_state: an empty ls-remote was treated unconditionally as 'branch gone -> GC cache', so a TRANSIENT ls-remote failure could delete a live staged ref's cache. Now distinguishes rc!=0 (transient -> keep, fail-safe) from rc==0+empty (truly gone -> GC). - mergeability poll: skip the trailing sleep after the final (6th) attempt. DSO-Story: 3ebb-fa93-f980-40de
- test-llm-review-dispatch-or-skip (T9 no-marker path): assert stderr emits the INDETERMINATE_ESCALATION: machine-routing marker (DD3), not just exit 75 — a partial impl that exits 75 without escalating would otherwise pass silently. - test-review-coverage-invariant (T4b): assert BOTH range SHAs are walked (UNREVIEWED sha1 + INDETERMINATE sha2 + summary unreviewed=1 errors=1) so the violation-dominates-error scenario is genuinely exercised, not just sha2. - test-merge-to-main-resume-inc008: isolate git config (GIT_CONFIG_GLOBAL/SYSTEM =/dev/null) so ambient ~/.gitconfig or /etc/gitconfig hooks/signing/aliases cannot leak into the init/clone/commit/push test setup. All green: dispatch 44, coverage-invariant 14, inc008 23. shellcheck SC2120=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The schema-correction-exhausted fail-closed path in the dso_ci_review runner emitted only a generic "exhausted all N attempt(s)" message, omitting _schema_result.errors. The violating field/value lived solely in the synthetic schema_error finding description (written to the findings JSON, which CI does not upload), so the actual violation was unrecoverable from the CI job log. - runner.py: append the (truncated, max 10 + overflow) schema errors to the fail-closed stderr message. Enforcement is unchanged — still returns 1. - runner.py: reconcile get_schema_correction_max_attempts docstring (default 1 -> 2) to match the code (read_config_int default 2). - add test_runner_schema_exhaustion_stderr.py (RED->GREEN): asserts the schema error detail appears in stderr AND exit is non-zero (fail-closed preserved). Diagnosability-only: does not change which findings block. Follow-up 284b-2f0a-3b12-41d6 tracks the functional convergence fix (frozen-field re-rejection), blocked until a captured payload exists — which this enables. Antipattern-Scan: fail-closed-swallows-schema-errors root=plugins/dso/scripts/dso_ci_review matches=0 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (9d2c) Address review finding on PR #680: the single-error test did not exercise the truncation logic. Add test_schema_exhaustion_stderr_truncates_and_counts_overflow (12 distinct errors) asserting the REAL formatting in runner.py — first 10 present, 11th/12th truncated, "(+2 more)" overflow suffix — so a broken slice or overflow count now fails the test. Refactor shared setup into _run_exhaustion_path. The pre-existing single-error test already pins `_err_preview` interpolation (it fails if the errors are not interpolated into the message). Mocking _validate_findings_schema (external subprocess) and dispatch_schema_correction (live LLM dispatch) is appropriate unit isolation; the assertions verify the runner's own formatting output, not mock passthrough. Antipattern-Scan: fail-closed-swallows-schema-errors root=plugins/dso/scripts/dso_ci_review matches=0 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DSO-Story: 3ebb-fa93-f980-40de
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| tristate_indeterminate_escalation() { | ||
| local _gate="${1:-unknown-gate}" _reason="${2:-verdict could not be computed}" _ref="${3:-}" | ||
| # JSON-safe the free-text fields (collapse quotes/newlines) so the marker is | ||
| # always parseable for telemetry regardless of the caller's reason string. | ||
| local _gs="${_gate//\"/\'}"; _gs="${_gs//$'\n'/ }" | ||
| local _rs="${_reason//\"/\'}"; _rs="${_rs//$'\n'/ }" | ||
| local _next="/dso:fp-recovery${_ref:+ $_ref}" | ||
| printf 'INDETERMINATE_ESCALATION: {"gate":"%s","reason":"%s","next_action":"%s","recovery":"in-channel"}\n' \ | ||
| "$_gs" "$_rs" "$_next" >&2 |
There was a problem hiding this comment.
💡 Edge Case: Escalation JSON marker not sanitized against backslashes/control chars
tristate_indeterminate_escalation builds a JSON marker by only replacing double-quotes (//"/\') and newlines in the gate/reason fields. A reason containing a backslash (e.g. \x) or other control characters (tab, carriage return) would produce invalid JSON, breaking the machine-routing/telemetry parse that the marker exists to support (the test T8b/T8c only exercises quotes). In practice this is low-risk because reasons are constructed from hardcoded gate labels and git branch/ref names (which cannot contain backslashes), so I'm flagging it as minor. If you want the marker to be unconditionally parseable regardless of caller input, build the JSON with python3 -c 'import json,sys; ...' (or jq) rather than manual string substitution.
Source: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124.
Serialize the marker with json.dumps so any caller-supplied text (backslashes, tabs, quotes) is always valid JSON.:
local _next="/dso:fp-recovery${_ref:+ $_ref}"
printf 'INDETERMINATE_ESCALATION: %s
' \
"$(_gate="$_gate" _reason="$_reason" _next="$_next" python3 -c '
import json, os
print(json.dumps({"gate": os.environ["_gate"], "reason": os.environ["_reason"],
"next_action": os.environ["_next"], "recovery": "in-channel"}))')" >&2
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
…284b) The dso_ci_review dispatcher appends a `fallback_hops` provenance key to the result dict when a provider-fallback hop occurs (dispatch.py:817/852) — AFTER the LLM output, so the schema-correction loop cannot strip it. The validator's top-level allowlist omitted it, so any review that hopped providers failed schema validation, exhausted the 2-attempt corrector, and blocked merge fail-closed with 0 content findings (the recurring "schema correction exhausted" failure; payload captured on PR #684 via the 9d2c diagnosability logging). Fix: add fallback_hops to optional_top in validate-review-output.sh. Per the epic-588e panel, ALLOWLIST (not strip) — stripping unknown top-level keys would mask a malformed/empty payload into a false pass. RED->GREEN: test_code_review_dispatch_with_fallback_hops_passes; validator suite 96/96. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
DSO-Review-Cycle: 1 pr_number=684 commit_sha=25c9143875ffe04f70fd7367aee3485a6dce1035 findings_hash=2fb01029b53e27f9 tuples=[["plugins/dso/scripts/dso_ci_review/runner.py", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "verification"]] |
| _mergeable_json=$(gh pr view "$_pr_number" --json mergeable 2>/dev/null || true) | ||
| _mergeable=$(echo "$_mergeable_json" | python3 -c " | ||
| # --- 6. Resolve mergeability: poll until computed (N2), then fail-closed | ||
| # (determinately) on a true conflict (R-C). --- |
There was a problem hiding this comment.
DSO llm-review — finding 1/2
[important] plugins/dso/scripts/merge-to-main-pr.sh:1519 plugins/dso/scripts/merge-to-main-pr.sh:1545 (correctness)
PR mergeability polling (merge-to-main-pr.sh, lines 1519-1543) silently treats structural GitHub API errors (404 Not Found, 401 Unauthorized) as transient and continues polling. The `gh pr view` call output is suppressed with `2>/dev/null || true` (line 1521) and errors are not distinguished from timeouts. A deleted PR or revoked token will cause _mergeable_json to be empty, _mergeable to parse as empty string, and the code to fall through to the CONFLICTING check (line 1545) without raising an error. The PR merge then proceeds with unverified mergeability state. This violates the tristate lattice principle (docs/contracts/review-tristate-lattice.md): a genuine error (not found, unauthorized) must resolve to FAIL, not silently continue. The mergeable polling should check the gh pr view exit code separately and fail fast on non-transient errors (404, 401, 403) before retrying. Currently only CONFLICTING state is caught as determinate; other states silently become empty string, indistinguishable from a failed API call.
Posted by dso_ci_review.runner; resolve this comment when addressed.
There was a problem hiding this comment.
Fixed: capture the gh pr view exit code; if a non-zero rc (404 deleted PR / 401 revoked-or-expired token / API outage) leaves mergeability unresolved, the poll now fails CLOSED (error + return 1) instead of falling through as if mergeable. A legitimately-UNKNOWN-then-timeout with gh rc 0 keeps the prior non-blocking behavior.
| # not upload) — leaving the CI job log with a generic | ||
| # "exhausted" message and the actual violation unrecoverable. | ||
| # Truncate so a pathological error list cannot flood the log. | ||
| _err_lines = _schema_result.errors or [] |
There was a problem hiding this comment.
DSO llm-review — finding 2/2
[important] plugins/dso/scripts/dso_ci_review/runner.py:3556 plugins/dso/scripts/dso_ci_review/runner.py:3557 (correctness)
In runner.py's schema exhaustion error-formatting path (lines 3556-3557), the code joins error list entries without type guards: `_err_preview = "; ".join(_err_lines[:10])`. If any entry in _err_lines is not a string (e.g., None, dict, int, or other non-stringifiable type), the join operation will raise TypeError and crash the fail-closed path. The `_schema_result.errors` field is initialized in _SchemaValidationResult as a list of error strings (from the validator output), so production risk is low. However, the code is undefended against a validator returning non-string entries or a future refactor changing the error structure. The test fixture (test_runner_schema_exhaustion_stderr.py) mocks _schema_result with clean string entries, so this latent defect is not exercised. A simple fix: coerce each element to string before joining, e.g., `"; ".join(str(e) for e in _err_lines[:10])`.
Posted by dso_ci_review.runner; resolve this comment when addressed.
There was a problem hiding this comment.
Fixed: str()-coerce each entry before join ("; ".join(str(_e) for _e in _err_lines[:10])) so a non-string entry cannot raise TypeError in the fail-closed path. Added a regression test feeding {dict,None,str,int} asserting the path stays fail-closed (non-zero exit) and still surfaces string entries.
- runner.py: str()-coerce schema-error entries before join so a non-string entry (None/dict/int) cannot raise TypeError and crash the schema-exhaustion fail-closed branch. +regression test (non-string + truncation cases). - merge-to-main-pr.sh: the PR mergeability poll now fails CLOSED when `gh pr view` exits non-zero (404 deleted PR / 401 revoked-or-expired token / API outage) and leaves mergeability unresolved, instead of silently proceeding as if mergeable. A legitimately-UNKNOWN-then-timeout with gh rc 0 keeps the prior behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
DSO-Review-Cycle: 1 pr_number=684 commit_sha=b1f392e6e5a89c5c6f44acd8404a1c13dffe1f60 findings_hash=6bde88c02a853064 tuples=[["plugins/dso/docs/contracts/review-tristate-lattice.md", "", "correctness"], ["plugins/dso/scripts/dso_ci_review/runner.py", "", "correctness"], ["plugins/dso/scripts/dso_ci_review/runner.py", "", "maintainability"], ["plugins/dso/scripts/lib/review-tristate-lib.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["tests/scripts/test-merge-to-main-resume-inc008.sh", "", "verification"], ["tests/scripts/test-review-tristate-lib.sh", "", "verification"]] |
| # 78 PRECONDITION_NOT_MET cannot even start (no gh/token/repo) — a special | ||
| # INDETERMINATE; mode-gated by precondition-gate.sh | ||
|
|
||
| # Named exit codes (consumers should `exit "$TRISTATE_INDETERMINATE"` etc.). |
There was a problem hiding this comment.
DSO llm-review — finding 1/7
[critical] plugins/dso/scripts/lib/review-tristate-lib.sh:35 (correctness)
review-tristate-lib.sh exports TRISTATE_* constants unconditionally at the top level, but consumers unguard their usage. When review-coverage-invariant.sh sources the lib with a failing precondition (exit 78), the sourcing script exits immediately—correct. However, if the lib file itself contains a parse error (malformed bash syntax), the `source` command fails, $? is non-zero, the script exits 78—also correct. But if the lib exists and parses but the export statement fails for some reason (though rare), or if a transient environment-variable corruption occurs, $TRISTATE_INDETERMINATE may be unset at runtime. The downstream 'exit "$_verdict_code"' (line 202 of review-coverage-invariant.sh) or 'exit "${TRISTATE_INDETERMINATE:-75}"' (line 96 of llm-review-dispatch-or-skip.sh, line 3254 of merge-to-main-pr.sh) would produce a silent fallback to 75 due to the default expansion, making the exit code correct by accident rather than by design. This violates the load-bearing contract (docs/contracts/review-tristate-lattice.md lines 32–40): the tristate must be deterministic, not accidental. The fix: verify the constants are defined immediately after sourcing via a guard like '[[ -z "${TRISTATE_INDETERMINATE:-}" ]] && { echo PRECONDITION_NOT_MET...; exit 78; }', so unset constants fail explicitly rather than silently falling back.
Posted by dso_ci_review.runner; resolve this comment when addressed.
| print('') | ||
| " 2>/dev/null || true) | ||
| [[ -n "$_mergeable" && "$_mergeable" != "UNKNOWN" ]] && break | ||
| [[ "$_mq_try" == "6" ]] || sleep 5 # don't sleep after the final attempt |
There was a problem hiding this comment.
DSO llm-review — finding 2/7
[important] plugins/dso/scripts/merge-to-main-pr.sh:1539 plugins/dso/scripts/merge-to-main-pr.sh:1549 (correctness)
In merge-to-main-pr.sh, the mergeability polling loop (lines 1529–1546) contains a race-window logic error: after exiting the loop (either via break or exhaustion), the fail-closed check '[[ ( -z "$_mergeable" || "$_mergeable" == "UNKNOWN" ) && "$_mq_gh_rc" -ne 0 ]]' only blocks if the gh call itself failed (rc ≠ 0). This misses the case where the final poll attempt (attempt 6) succeeds (gh rc=0) but returns _mergeable="UNKNOWN" (GitHub still computing after 25 seconds). The loop break condition '[[ -n "$_mergeable" && "$_mergeable" != "UNKNOWN" ]]' is false for UNKNOWN, so the loop exits via natural completion, not break. After the loop, the fail-closed condition is false because _mq_gh_rc IS 0, so the code falls through to the CONFLICTING check, which is also false, and proceeds to enqueue auto-merge on a PR with unresolved mergeability. This is a determinate operational defect: after ~25 seconds of polling, if mergeability is still unknown, the code should fail closed (not attempt merge), but instead it silently proceeds. The fix: change the fail-closed condition to '[[ ( -z "$_mergeable" || "$_mergeable" == "UNKNOWN" ) ]]' (remove the '&& "$_mq_gh_rc" -ne 0' clause), so any unresolved mergeability (from either a failed call or a timeout) blocks.
Posted by dso_ci_review.runner; resolve this comment when addressed.
| ## Lattice rules (LOAD-BEARING — every consumer must honor all four) | ||
|
|
||
| 1. **FAIL is the safe bottom.** A genuine violation resolves to FAIL and is never retried, downgraded, or reclassified as transient. When a gate sees both a violation and a compute-error, the violation **dominates** → FAIL. | ||
| 2. **Indistinguishable → FAIL.** When PASS vs INDETERMINATE cannot be distinguished by evidence at decision time, resolve to FAIL. (`tristate_classify_verdict` coerces an unparseable error-count to ≥1 so it never silently becomes PASS.) |
There was a problem hiding this comment.
DSO llm-review — finding 3/7
[important] plugins/dso/docs/contracts/review-tristate-lattice.md:32 (correctness)
The tristate contract (docs/contracts/review-tristate-lattice.md, section 'Lattice rules', rule 4) states 'INDETERMINATE retries ONLY on observably-transient cause' and references tristate_is_transient_error for classification. However, none of the three emitting gates (review-coverage-invariant.sh, llm-review-dispatch-or-skip.sh, merge-to-main-pr.sh) call tristate_is_transient_error to classify errors before escalating. They directly escalate all INDETERMINATE verdicts to /dso:fp-recovery without determining whether the cause is transient and thus self-healing. This creates a contract/implementation gap: the contract promises orchestrator-level retry on observably-transient causes, but the gates do not implement that retry logic—they escalate immediately. A transient GitHub API 503 (which should self-heal with a delayed retry) is instead escalated to /dso:fp-recovery, forcing manual recovery for a condition that should have been automatically retried.
Posted by dso_ci_review.runner; resolve this comment when addressed.
| # (checkout the feature branch) and exits 1, and is deliberately NOT routed to | ||
| # /dso:fp-recovery (fp-recovery cannot fix a wrong-branch checkout). The /tmp | ||
| # state cache for spent staged refs is GC'd so a later --resume starts clean. | ||
| if _branch_is_staged_promotion; then |
There was a problem hiding this comment.
DSO llm-review — finding 4/7
[important] plugins/dso/scripts/merge-to-main-pr.sh:3356 plugins/dso/scripts/merge-to-main-pr.sh:3368 (correctness)
In merge-to-main-pr.sh lines 3368–3375, when the spent-staged-ref check determines the branch is spent (0 commits ahead), it emits a recovery message stating '(0 commits ahead of origin/${_DEFAULT_BRANCH:-main})'. However, if the git rev-list command fails (network error, permission denied), _cur_ahead is empty, and resume_staged_ref_is_spent returns 1 (not spent, fail-safe). The conditional on line 3368 is false, so the recovery message is NOT printed. But if the git command later succeeds with output '0' due to transient recovery, the conditional is true and the message claims the ref is spent with '0 commits ahead', when the operator may have just experienced a transient git failure. The recovery message causes the operator to checkout the feature branch without understanding that a transient git operation (network lag, momentary token expiry) might have been the root cause. The fix: if _cur_ahead is empty/unknown, do NOT assume the ref is spent—either escalate as INDETERMINATE (unknown state) or emit a different message explaining the git command failed and the operator should retry.
Posted by dso_ci_review.runner; resolve this comment when addressed.
| # Load merge-to-main-pr.sh in lib mode (defines functions, returns before main flow). | ||
| export PR_LIB_MODE=1 | ||
| # shellcheck source=/dev/null | ||
| source "$MERGE_PR" 2>/dev/null || true |
There was a problem hiding this comment.
DSO llm-review — finding 5/7
[important] tests/scripts/test-merge-to-main-resume-inc008.sh:38 tests/scripts/test-merge-to-main-resume-inc008.sh:41 (verification)
Test file test-merge-to-main-resume-inc008.sh (lines 39–41) sources merge-to-main-pr.sh in lib-mode via 'export PR_LIB_MODE=1; source "$MERGE_PR"; unset PR_LIB_MODE'. The test checks that resume_pr_query_trustworthy exists ('type resume_pr_query_trustworthy'), but does NOT verify that lib-mode sourcing actually executed (i.e., that the script returned before the main flow). If lib-mode sourcing fails (e.g., the PR_LIB_MODE guard in merge-to-main-pr.sh is missing or broken), the script may run its main flow (which would create git refs, modify state), and the test would still pass if the function happened to be defined by a prior test run or environment variable. This is a false-positive risk: the test asserts function presence but not function correctness under the lib-mode contract. The fix: add a precondition assertion that merge-to-main-pr.sh contains the PR_LIB_MODE guard (e.g., grep for 'if.*PR_LIB_MODE.*return'), or verify that sourcing does NOT execute main code (e.g., by checking that 'source "$MERGE_PR" 2>&1' produces no git operations or output from the script body).
Posted by dso_ci_review.runner; resolve this comment when addressed.
| _SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "$_SCRIPT_DIR/../.." && pwd)" | ||
| LIB="$REPO_ROOT/plugins/dso/scripts/lib/review-tristate-lib.sh" # shim-exempt: sources the lib under test | ||
| # shellcheck disable=SC1090 |
There was a problem hiding this comment.
DSO llm-review — finding 6/7
[important] tests/scripts/test-review-tristate-lib.sh:30 tests/scripts/test-review-tristate-lib.sh:36 (verification)
Test file test-review-tristate-lib.sh (lines 30–36) sources review-tristate-lib.sh and checks that functions are defined (tristate_classify_verdict, tristate_is_transient_error, tristate_code_to_name), but does NOT verify that the contract constants (TRISTATE_PASS=0, TRISTATE_FAIL=1, TRISTATE_INDETERMINATE=75, TRISTATE_PRECONDITION_NOT_MET=78) are exported and have the correct values. Production code relies on these constants (e.g., 'exit "${TRISTATE_INDETERMINATE:-75}"' in merge-to-main-pr.sh, llm-review-dispatch-or-skip.sh). If the lib fails to export these constants (e.g., due to a typo, missing export statement, or accidental unset), the test passes but production code silently falls back to the default value (75), masking the missing constant. A later change to a constant value (e.g., TRISTATE_INDETERMINATE moved from 75 to 76) would not be caught by this test, causing a silent mismatch between the lib and callers. The fix: add explicit assertions that each constant is defined and has the expected integer value before running behavioral tests.
Posted by dso_ci_review.runner; resolve this comment when addressed.
| # typed list[str], but a defensive guard prevents a non-string | ||
| # entry (None/dict/int) from raising TypeError in join and | ||
| # crashing this fail-closed path. | ||
| _err_preview = "; ".join(str(_e) for _e in _err_lines[:10]) |
There was a problem hiding this comment.
DSO llm-review — finding 7/7
[important] plugins/dso/scripts/dso_ci_review/runner.py:3561 plugins/dso/scripts/dso_ci_review/runner.py:3565 (correctness)
In dso_ci_review/runner.py lines 3565–3568, the schema-exhaustion error-detail formatting converts _schema_result.errors entries via str()-coercion. The comment notes 'a defensive guard prevents a non-string entry (None/dict/int) from raising TypeError'. However, the defensive guard is the str() coercion itself, which succeeds but produces misleading output: str(None) -> 'None', str({...}) -> '{...}' (dict repr, not JSON), str(42) -> '42'. These are not user-friendly error messages. More critically, if _schema_result.errors is not a list (e.g., a bare string, None, or a dict), the slice _err_lines[:10] and the iteration 'for _e in _err_lines[:10]' fail silently or produce incorrect output. A string '"field: invalid"' sliced as [:10] produces a substring (a partial error), not the full error. The code should validate that _err_lines is a list, and should format non-string entries as JSON or raise them as diagnostic strings. The fix: check 'isinstance(_err_lines, list)' before slicing, and handle non-list values by emitting a diagnostic ('errors field is malformed') instead of silently attempting to iterate.
Posted by dso_ci_review.runner; resolve this comment when addressed.
CI failed: The CI pipeline correctly blocked the merge because the automated LLM code reviewer identified 1 critical and 6 important findings in your changes. These findings act as quality gates that must be resolved before the code can proceed.OverviewThe build failed during the FailuresLLM Quality Gate Violation (confidence: high)
Summary
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsIntegrates resilient three-valued verdict logic and garbage collection to harden the two-tier staging flow. Sanitize backslashes and control characters in the 💡 Edge Case: Escalation JSON marker not sanitized against backslashes/control chars📄 plugins/dso/scripts/lib/review-tristate-lib.sh:116-124
Source: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124. Serialize the marker with json.dumps so any caller-supplied text (backslashes, tabs, quotes) is always valid JSON.🤖 Prompt for agentsTip Comment OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
FP_RECOVERY: pr_number=684 manual_reviewer_hash=926898d31662fcdb15b45b884eb5d1e963018db92b362cb579ac71610ef95c8b fp_rationale="CI review-sub-pr escalated to 1 critical + 6 important that re-litigated just-made fixes (polling guard, str() coercion) AND flagged previously-panel-approved 3ebb code (tristate lib/contract/tests) — FP-amplification / non-convergence. A NEUTRAL (context-free) dso:code-reviewer-standard at opus tier over the full diff found 0 findings (12 tool calls / 103s), examining the exact flagged areas." Cleared: 0 critical / 0 important / 0 fragile. All other required checks green; only review-sub-pr (the FP-amplified check) bypassed. Per FP-RECOVERY-WORKFLOW Step 5 — human web-UI bypass merge required. |
2a02101
into
staged-793b29ce7031-1780709220
…2195 review-convergence-check.sh + review-finding-identity.sh were built and unit-tested but invoked NOWHERE, so the review loop had no oscillation / cycle-cap brake. The 2026-06-08 FP-analysis tied the largest *systemic* FP amplifier to this: review-sub-pr re-rolling different findings each cycle on the same code and re-flagging already-fixed / panel-approved lines (PRs #633/#641/#643/#647/#649/#684 — "FP-amplification, bug 2195"). Wires the check into the ci.yml llm-review job in ADVISORY mode (warn-only, NOT in required-checks.txt) per the workflow-stability warn-first convention. Per-PR convergence history persists across review cycles via an accumulating actions/cache (unique key per run, restored from the latest same-PR prefix); the step emits a ::warning:: on oscillation / cap and never blocks. Admin go-live = make the step fail on exit 1 + add to required-checks. Active re-flag suppression already exists via the defense store (DSO_SUPPRESS_PRIOR_DEFENSES); this adds the missing detection brake. actionlint clean. Regression guard: tests/workflows/test-review-convergence-wired.sh (3/3) asserts the wiring + script presence so 2195 ("wired nowhere") cannot recur. Status updated in WORKFLOW-STABILITY-CHECKS.md. Epic 7412. Relates-to bug 2195-a056-e5e1-4743. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2195 review-convergence-check.sh + review-finding-identity.sh were built and unit-tested but invoked NOWHERE, so the review loop had no oscillation / cycle-cap brake. The 2026-06-08 FP-analysis tied the largest *systemic* FP amplifier to this: review-sub-pr re-rolling different findings each cycle on the same code and re-flagging already-fixed / panel-approved lines (PRs #633/#641/#643/#647/#649/#684 — "FP-amplification, bug 2195"). Wires the check into the ci.yml llm-review job in ADVISORY mode (warn-only, NOT in required-checks.txt) per the workflow-stability warn-first convention. Per-PR convergence history persists across review cycles via an accumulating actions/cache (unique key per run, restored from the latest same-PR prefix); the step emits a ::warning:: on oscillation / cap and never blocks. Admin go-live = make the step fail on exit 1 + add to required-checks. Active re-flag suppression already exists via the defense store (DSO_SUPPRESS_PRIOR_DEFENSES); this adds the missing detection brake. actionlint clean. Regression guard: tests/workflows/test-review-convergence-wired.sh (3/3) asserts the wiring + script presence so 2195 ("wired nowhere") cannot recur. Status updated in WORKFLOW-STABILITY-CHECKS.md. Epic 7412. Relates-to bug 2195-a056-e5e1-4743. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Staged promotion (PR-C two-PR flow)
This is the first PR of a two-PR session→main promotion. The worktree branch
story/588e-abec-221f-42c0/3ebb-resilience-dd123-6is being merged into the staged refstaged-793b29ce7031-1780709220, which was created fromorigin/mainHEAD.The sub-PR ruleset's
required_workflowsrule requiresreview-sub-prto run successfully on this PR before merge. Once this PR merges, the merge-to-main script will open PR2 fromstaged-793b29ce7031-1780709220→main.🤖 Generated by merge-to-main-pr.sh
Summary by Gitar
review-tristate-lib.shimplementing a three-valued verdict lattice (PASS/FAIL/INDETERMINATE) to distinguish review violations from mechanical compute errors.tristate_indeterminate_escalationto route indeterminate failures to/dso:fp-recoveryfor in-channel recovery.merge-to-main-pr.shto prevent advancing on partial or untrustworthy GitHub PR queries.staged-*refs to prevent checkout stalls.runner.pyto include granular schema-error details in stderr when validation is exhausted.fallback_hopsinvalidate-review-output.shto support infra-emitted provenance keys.This will update automatically on new commits.