Skip to content

staged: story/588e-abec-221f-42c0/3ebb-resilience-dd123-6 -> staged-793b29ce7031-1780709220#684

Merged
JoeOakhartNava merged 19 commits into
staged-793b29ce7031-1780709220from
story/588e-abec-221f-42c0/3ebb-resilience-dd123-6
Jun 6, 2026
Merged

staged: story/588e-abec-221f-42c0/3ebb-resilience-dd123-6 -> staged-793b29ce7031-1780709220#684
JoeOakhartNava merged 19 commits into
staged-793b29ce7031-1780709220from
story/588e-abec-221f-42c0/3ebb-resilience-dd123-6

Conversation

@joeoakhart

@joeoakhart joeoakhart commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Staged promotion (PR-C two-PR flow)

This is the first PR of a two-PR session→main promotion. The worktree branch story/588e-abec-221f-42c0/3ebb-resilience-dd123-6 is being merged into the staged ref staged-793b29ce7031-1780709220, which was created from origin/main HEAD.

The sub-PR ruleset's required_workflows rule requires review-sub-pr to run successfully on this PR before merge. Once this PR merges, the merge-to-main script will open PR2 from staged-793b29ce7031-1780709220main.

🤖 Generated by merge-to-main-pr.sh


Summary by Gitar

  • Resilience infrastructure:
    • Added review-tristate-lib.sh implementing a three-valued verdict lattice (PASS/FAIL/INDETERMINATE) to distinguish review violations from mechanical compute errors.
    • Added tristate_indeterminate_escalation to route indeterminate failures to /dso:fp-recovery for in-channel recovery.
  • Orchestrator safety:
    • Added INC-008 guards in merge-to-main-pr.sh to prevent advancing on partial or untrustworthy GitHub PR queries.
    • Added INC-015 garbage collection for stale/spent staged-* refs to prevent checkout stalls.
  • Schema validation:
    • Updated runner.py to include granular schema-error details in stderr when validation is exhausted.
    • Allowlisted fallback_hops in validate-review-output.sh to support infra-emitted provenance keys.
  • Documentation:
    • Added ADR-0020 documenting known concurrency limitations of the two-tier flow compared to a merge queue.

This will update automatically on new commits.

Test and others added 17 commits June 5, 2026 18:22
3ebb DD1 (resilience-harden, part 1): codify the review-gate decidability
lattice PASS/FAIL/INDETERMINATE and retrofit the first named Goal-1 gate.

- review-tristate-lib.sh: lattice helper (generalizes precondition-gate.sh
  exit-78). tristate_classify_verdict (unreviewed>0 -> FAIL/1 safe bottom;
  errors-only -> INDETERMINATE/75; clean -> PASS/0), tristate_is_transient_error
  (5xx/429/rate-limit/timeout/reset/gateway only), tristate_code_to_name.
- review-tristate-lattice.md: review-specific contract + security guardrail.
- review-coverage-invariant.sh: separates could-not-confirm (INDETERMINATE/75)
  from a genuine unreviewed SHA (FAIL/1); both still block under enforce.

TDD: test-review-tristate-lib.sh 20/20 (new); coverage-invariant T4 -> 75,
T4b violation-dominates -> 1; harness hermetic.

Refs 3ebb-fa93-f980-40de DD1.
DSO-Story: 3ebb-fa93-f980-40de
…arker-absent

Completes 3ebb DD1: retrofit the second named Goal-1 gate to the tristate lattice.

The marker-absent path (provenance verifier crashed / never ran) previously
exited 1 — conflated with the normal 'dispatch runner' exit 1. It is genuinely
'could not COMPUTE the dispatch decision' -> INDETERMINATE (exit 75): still
blocks (never the 8a77 silent-skip exit 0), but a distinct signal the
orchestrator may retry on a transient cause / route to escalation (DD3). Sources
review-tristate-lib.sh script-relative (hermetic under test).

TDD: test-llm-review-dispatch-or-skip.sh Test 9 updated (marker-absent -> 75);
43/43 green.

Refs 3ebb-fa93-f980-40de DD1.
DSO-Story: 3ebb-fa93-f980-40de
… never authoritative-empty

3ebb DD2 (durable --resume): treat a GitHub PR-state read as authoritative only
when the call succeeded AND returned well-formed JSON. A non-zero rc, an empty
payload, or truncated/malformed JSON (a page cut short) is INDETERMINATE.

- resume_pr_query_trustworthy (pure, INC-008 guard): rc 0 + valid JSON (incl. a
  legitimately-empty '[]') -> trustworthy; otherwise -> 75 (INDETERMINATE). The
  empty-but-valid '[]' case is deliberately distinguished from a truncated read.
- advance-to-PR2 wiring: the PR1-state read now routes through the guard. On an
  indeterminate read it re-fetches once, then HALTS as INDETERMINATE (exit 75)
  rather than GUESS a direction — collapsing partial state into '0 PRs' would
  advance onto unverified state (skip review) and into '>0' would mint a
  duplicate staged-*+PR1. Neither is safe to guess (INC-008 / INC-015 stall,
  hit twice this session).

TDD: test-merge-to-main-resume-inc008.sh 8/8 (new); resume regressions green
(staged-advance 12/12, existing-pr-discovery 6/6). No new shellcheck findings.

Refs 3ebb-fa93-f980-40de DD2.
DSO-Story: 3ebb-fa93-f980-40de
…dicts

3ebb DD3: an INDETERMINATE verdict from any gate routes to /dso:fp-recovery via
a single standardized, greppable (telemetry-ready) marker — never manual git
surgery. Last-resort by design: fires only after transient retries are spent; a
genuine FAIL never escalates here.

- tristate_indeterminate_escalation (review-tristate-lib.sh): emits the
  INDETERMINATE_ESCALATION JSON marker + an actionable banner; JSON-safe reason.
- wired at all three INDETERMINATE sites: review-coverage-invariant.sh
  (INDETERMINATE-enforce exit), llm-review-dispatch-or-skip.sh (marker-absent),
  merge-to-main-pr.sh (INC-008 resume halt, after one re-fetch).
- merge-to-main-pr.sh now sources review-tristate-lib.sh (script-relative).
- contract doc documents the escalation surface + the anti-friction stance
  (escalation rate is a defect signal, not normal operation).

TDD: test-review-tristate-lib.sh 23/23 (T8/T8b/T8c added); coverage-invariant
14/14, dispatcher 43/43, inc008 8/8, staged-advance 12/12 — all green.

Refs 3ebb-fa93-f980-40de DD3.
DSO-Story: 3ebb-fa93-f980-40de
…/tmp is a cache)

Completes 3ebb DD2: treat /tmp/merge-to-main-state-*.json as a CACHE, not the
source of truth (GitHub + git are), and kill the INC-015 stale-checkout stall.

- resume_staged_ref_is_spent (pure): a staged-* ref with 0 commits ahead of
  origin/main is SPENT (already merged / empty). Fail-safe: unknown count is NOT
  spent (never skip/GC on uncertainty).
- resume_gc_stale_staged_state: GC /tmp cache entries for staged-* branches that
  are GONE from origin or SPENT, so a later --resume never reconstructs off a
  completed promotion's stale state.
- current-branch assertion (INC-015 (a)): before the PR1/PR2 phase, if the
  worktree sits on a SPENT staged-* ref, emit a concrete recovery (checkout your
  feature branch) and exit 1 + GC the stale cache — instead of the confusing
  'No commits between main and staged-*' stall. Deliberately NOT routed to
  /dso:fp-recovery: a wrong-branch checkout is a DETERMINATE operational error,
  not an INDETERMINATE verdict (anti-over-reliance, per the friction review).

TDD: test-merge-to-main-resume-inc008.sh 17/17 (predicate T8-T11 + GC T12-T14);
resume regressions green (staged-advance 12, resume-and-stale 7, cwd 7,
pr2-statefile 2, existing-pr-discovery 6, freshness 6). No new shellcheck.

Refs 3ebb-fa93-f980-40de DD2.
DSO-Story: 3ebb-fa93-f980-40de
CI ShellCheck (warning threshold) flagged SC2120 — resume_gc_stale_staged_state
references $1 but production callers use the /tmp default; only the test passes
an explicit dir. Add a scoped shellcheck disable documenting the intentional
optional arg. Comment-only; no behavior change.

DSO-Story: 3ebb-fa93-f980-40de
…branch fix)

CI Script Tests failed T14_gc_keeps_live: the fixture used 'git branch
staged-spent main' / 'checkout -b staged-live main', but CI's
init.defaultBranch=master means no local 'main' existed -> 'src refspec does
not match any' -> branches never created -> GC removed their (gone) cache.
Push HEAD directly to named remote refs instead; no local-branch-name
dependency. Verified 17/17 both normally and with init.defaultBranch=master.
Production GC code unchanged (it was correct).

DSO-Story: 3ebb-fa93-f980-40de
…nate conflict recovery

cca8-DD1 follow-on (vetted via the epic 3-lens panel + convergence reviewer).
When origin/main advanced mid-PR2, _phase_merge rebased the staged-* ref and
force-pushed it; the staged-* ruleset requires review-sub-pr on pushes (a check
that only runs on pull_request events, never on a push), so the force-push was
structurally un-satisfiable -> GH013 rejection -> pipeline died with a misleading
CONFLICT_DATA.

- _branch_is_staged_promotion: single-source-of-truth predicate; folds the 4
  scattered '[[ $BRANCH == staged-* ]]' literals (N1) so PR2-phase semantics
  cannot drift.
- R-A: in _phase_merge, SKIP the entire publish block (sync + source-branch
  version-bump + push) when on a staged-* promotion. staged was already
  created/pushed/reviewed by PR1; GitHub PR2 rebase-merge replays it onto current
  main at merge time. Prevents the GH013 from relocating to the plain-push path.
- R-C: a true PR2 CONFLICTING is a DETERMINATE operational error -> concrete
  re-stage RECOVERY + exit 1, NOT routed to /dso:fp-recovery (which cannot rebase
  staged); poll until mergeable is computed first (N2).
- BEHIND-base refresh is intentionally NOT duplicated here — the existing
  auto-merge poll loop already calls gh pr update-branch (jira-dig-2529).

TDD: test-merge-to-main-resume-inc008.sh +6 predicate cases (both polarities,
23/23); linear-sync 6/6, staged-advance 12/12, resume-and-stale 7/7 green.
shellcheck -S warning clean.

Refs 3ebb-fa93-f980-40de; origin cca8-f1ed-ddd3-48f1 DD1.
DSO-Story: 3ebb-fa93-f980-40de
…imitations

Document the two race classes the two-tier-over-merge-queue choice does NOT
serialize away (consequences of declining a merge queue, mitigated by quiescing):
- L1: concurrent PR2s can be individually-green but semantically conflict on main
  (the merge-queue serialization this ADR traded away). Mitigation: one main-bound
  promotion at a time.
- L2: the 3ee4 enforce-flip races in-flight PR2s during the live<->file skew
  window. Mitigation: runbook quiesce (no PR2 in flight during the swap).
Both are correctness/operational limits, not integrity regressions (no unreviewed
content can reach main). Surfaced per a workflow-resilience review so they are not
rediscovered as surprises; the 3ebb DD3 drift-reconciler stretch goal is the place
to revisit if the concurrency cost becomes material.

DSO-Story: 3ebb-fa93-f980-40de
CI ShellCheck (warning) flagged SC2120 — _branch_is_staged_promotion references
$1 but production callers use the $BRANCH default; only the test passes an
explicit arg. Add the scoped disable (same pattern as resume_gc_stale_staged_state).
Comment-only; verified SC2120=0 at default severity (CI-equivalent); remaining 8
findings are pre-existing SC2004/SC2006 style (below CI's failure threshold).

DSO-Story: 3ebb-fa93-f980-40de
DSO-Story: 3ebb-fa93-f980-40de
- F3 (real): add a cleanup trap for the inc008 GC temp dir so /tmp is not left
  stale on early exit/interrupt.
- F4 (real): resolve the tristate test's lib path via $BASH_SOURCE (script-
  relative) instead of 'git rev-parse --show-toplevel', robust under worktrees.
- F1 (relaxed): T4/T4b assert the behavioral exit code (75=INDETERMINATE,
  1=FAIL-dominates) + the offending SHA's presence, dropping the prose verdict
  word to reduce output-string coupling.
F2 (dispatcher marker assertion) defended: it asserts exit 75 + the stable
'provenance-complete.marker' filename (a contract path, not prose) — kept.

Tests green: tristate 23/23, inc008 23/23, coverage-invariant 14/14.
DSO-Story: 3ebb-fa93-f980-40de
…er, GC fail-safe, poll sleep)

Three real findings from review-sub-pr on PR #676 (production code):
- tristate_is_transient_error: the 5xx/429 globs (*50[0-9]*, *429*) matched a
  status code EMBEDDED in a larger number (e.g. '504800 bytes') -> could flip a
  non-transient INDETERMINATE into a retry, violating lattice rule 4. Anchor the
  numeric match on non-digit boundaries (grep -E '(^|[^0-9])(5[0-9]{2}|429)([^0-9]|$)').
  +4 edge-case tests (byte counts / offsets / SHA fragments stay non-transient); 27/27.
- resume_gc_stale_staged_state: an empty ls-remote was treated unconditionally as
  'branch gone -> GC cache', so a TRANSIENT ls-remote failure could delete a live
  staged ref's cache. Now distinguishes rc!=0 (transient -> keep, fail-safe) from
  rc==0+empty (truly gone -> GC).
- mergeability poll: skip the trailing sleep after the final (6th) attempt.

DSO-Story: 3ebb-fa93-f980-40de
- test-llm-review-dispatch-or-skip (T9 no-marker path): assert stderr emits the
  INDETERMINATE_ESCALATION: machine-routing marker (DD3), not just exit 75 — a
  partial impl that exits 75 without escalating would otherwise pass silently.
- test-review-coverage-invariant (T4b): assert BOTH range SHAs are walked
  (UNREVIEWED sha1 + INDETERMINATE sha2 + summary unreviewed=1 errors=1) so the
  violation-dominates-error scenario is genuinely exercised, not just sha2.
- test-merge-to-main-resume-inc008: isolate git config (GIT_CONFIG_GLOBAL/SYSTEM
  =/dev/null) so ambient ~/.gitconfig or /etc/gitconfig hooks/signing/aliases
  cannot leak into the init/clone/commit/push test setup.

All green: dispatch 44, coverage-invariant 14, inc008 23. shellcheck SC2120=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The schema-correction-exhausted fail-closed path in the dso_ci_review runner
emitted only a generic "exhausted all N attempt(s)" message, omitting
_schema_result.errors. The violating field/value lived solely in the synthetic
schema_error finding description (written to the findings JSON, which CI does
not upload), so the actual violation was unrecoverable from the CI job log.

- runner.py: append the (truncated, max 10 + overflow) schema errors to the
  fail-closed stderr message. Enforcement is unchanged — still returns 1.
- runner.py: reconcile get_schema_correction_max_attempts docstring
  (default 1 -> 2) to match the code (read_config_int default 2).
- add test_runner_schema_exhaustion_stderr.py (RED->GREEN): asserts the schema
  error detail appears in stderr AND exit is non-zero (fail-closed preserved).

Diagnosability-only: does not change which findings block. Follow-up
284b-2f0a-3b12-41d6 tracks the functional convergence fix (frozen-field
re-rejection), blocked until a captured payload exists — which this enables.

Antipattern-Scan: fail-closed-swallows-schema-errors root=plugins/dso/scripts/dso_ci_review matches=0

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (9d2c)

Address review finding on PR #680: the single-error test did not exercise the
truncation logic. Add test_schema_exhaustion_stderr_truncates_and_counts_overflow
(12 distinct errors) asserting the REAL formatting in runner.py — first 10
present, 11th/12th truncated, "(+2 more)" overflow suffix — so a broken slice or
overflow count now fails the test. Refactor shared setup into _run_exhaustion_path.

The pre-existing single-error test already pins `_err_preview` interpolation (it
fails if the errors are not interpolated into the message). Mocking
_validate_findings_schema (external subprocess) and dispatch_schema_correction
(live LLM dispatch) is appropriate unit isolation; the assertions verify the
runner's own formatting output, not mock passthrough.

Antipattern-Scan: fail-closed-swallows-schema-errors root=plugins/dso/scripts/dso_ci_review matches=0

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DSO-Story: 3ebb-fa93-f980-40de
@joeoakhart joeoakhart enabled auto-merge (rebase) June 6, 2026 01:27
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1b97f63-301f-424a-b330-1a3ba71e1c34

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch story/588e-abec-221f-42c0/3ebb-resilience-dd123-6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +116 to +124
tristate_indeterminate_escalation() {
local _gate="${1:-unknown-gate}" _reason="${2:-verdict could not be computed}" _ref="${3:-}"
# JSON-safe the free-text fields (collapse quotes/newlines) so the marker is
# always parseable for telemetry regardless of the caller's reason string.
local _gs="${_gate//\"/\'}"; _gs="${_gs//$'\n'/ }"
local _rs="${_reason//\"/\'}"; _rs="${_rs//$'\n'/ }"
local _next="/dso:fp-recovery${_ref:+ $_ref}"
printf 'INDETERMINATE_ESCALATION: {"gate":"%s","reason":"%s","next_action":"%s","recovery":"in-channel"}\n' \
"$_gs" "$_rs" "$_next" >&2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: Escalation JSON marker not sanitized against backslashes/control chars

tristate_indeterminate_escalation builds a JSON marker by only replacing double-quotes (//"/\') and newlines in the gate/reason fields. A reason containing a backslash (e.g. \x) or other control characters (tab, carriage return) would produce invalid JSON, breaking the machine-routing/telemetry parse that the marker exists to support (the test T8b/T8c only exercises quotes). In practice this is low-risk because reasons are constructed from hardcoded gate labels and git branch/ref names (which cannot contain backslashes), so I'm flagging it as minor. If you want the marker to be unconditionally parseable regardless of caller input, build the JSON with python3 -c 'import json,sys; ...' (or jq) rather than manual string substitution.

Source: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124.

Serialize the marker with json.dumps so any caller-supplied text (backslashes, tabs, quotes) is always valid JSON.:

    local _next="/dso:fp-recovery${_ref:+ $_ref}"
    printf 'INDETERMINATE_ESCALATION: %s
' \
        "$(_gate="$_gate" _reason="$_reason" _next="$_next" python3 -c '
import json, os
print(json.dumps({"gate": os.environ["_gate"], "reason": os.environ["_reason"],
                 "next_action": os.environ["_next"], "recovery": "in-channel"}))')" >&2
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

…284b)

The dso_ci_review dispatcher appends a `fallback_hops` provenance key to the
result dict when a provider-fallback hop occurs (dispatch.py:817/852) — AFTER the
LLM output, so the schema-correction loop cannot strip it. The validator's
top-level allowlist omitted it, so any review that hopped providers failed schema
validation, exhausted the 2-attempt corrector, and blocked merge fail-closed with
0 content findings (the recurring "schema correction exhausted" failure; payload
captured on PR #684 via the 9d2c diagnosability logging).

Fix: add fallback_hops to optional_top in validate-review-output.sh. Per the
epic-588e panel, ALLOWLIST (not strip) — stripping unknown top-level keys would
mask a malformed/empty payload into a false pass.

RED->GREEN: test_code_review_dispatch_with_fallback_hops_passes; validator suite 96/96.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

DSO-Review-Cycle: 1 pr_number=684 commit_sha=25c9143875ffe04f70fd7367aee3485a6dce1035 findings_hash=2fb01029b53e27f9 tuples=[["plugins/dso/scripts/dso_ci_review/runner.py", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "verification"]]

_mergeable_json=$(gh pr view "$_pr_number" --json mergeable 2>/dev/null || true)
_mergeable=$(echo "$_mergeable_json" | python3 -c "
# --- 6. Resolve mergeability: poll until computed (N2), then fail-closed
# (determinately) on a true conflict (R-C). ---

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 1/2

[important] plugins/dso/scripts/merge-to-main-pr.sh:1519 plugins/dso/scripts/merge-to-main-pr.sh:1545 (correctness)

PR mergeability polling (merge-to-main-pr.sh, lines 1519-1543) silently treats structural GitHub API errors (404 Not Found, 401 Unauthorized) as transient and continues polling. The `gh pr view` call output is suppressed with `2>/dev/null || true` (line 1521) and errors are not distinguished from timeouts. A deleted PR or revoked token will cause _mergeable_json to be empty, _mergeable to parse as empty string, and the code to fall through to the CONFLICTING check (line 1545) without raising an error. The PR merge then proceeds with unverified mergeability state. This violates the tristate lattice principle (docs/contracts/review-tristate-lattice.md): a genuine error (not found, unauthorized) must resolve to FAIL, not silently continue. The mergeable polling should check the gh pr view exit code separately and fail fast on non-transient errors (404, 401, 403) before retrying. Currently only CONFLICTING state is caught as determinate; other states silently become empty string, indistinguishable from a failed API call.

Posted by dso_ci_review.runner; resolve this comment when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: capture the gh pr view exit code; if a non-zero rc (404 deleted PR / 401 revoked-or-expired token / API outage) leaves mergeability unresolved, the poll now fails CLOSED (error + return 1) instead of falling through as if mergeable. A legitimately-UNKNOWN-then-timeout with gh rc 0 keeps the prior non-blocking behavior.

# not upload) — leaving the CI job log with a generic
# "exhausted" message and the actual violation unrecoverable.
# Truncate so a pathological error list cannot flood the log.
_err_lines = _schema_result.errors or []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 2/2

[important] plugins/dso/scripts/dso_ci_review/runner.py:3556 plugins/dso/scripts/dso_ci_review/runner.py:3557 (correctness)

In runner.py's schema exhaustion error-formatting path (lines 3556-3557), the code joins error list entries without type guards: `_err_preview = "; ".join(_err_lines[:10])`. If any entry in _err_lines is not a string (e.g., None, dict, int, or other non-stringifiable type), the join operation will raise TypeError and crash the fail-closed path. The `_schema_result.errors` field is initialized in _SchemaValidationResult as a list of error strings (from the validator output), so production risk is low. However, the code is undefended against a validator returning non-string entries or a future refactor changing the error structure. The test fixture (test_runner_schema_exhaustion_stderr.py) mocks _schema_result with clean string entries, so this latent defect is not exercised. A simple fix: coerce each element to string before joining, e.g., `"; ".join(str(e) for e in _err_lines[:10])`.

Posted by dso_ci_review.runner; resolve this comment when addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: str()-coerce each entry before join ("; ".join(str(_e) for _e in _err_lines[:10])) so a non-string entry cannot raise TypeError in the fail-closed path. Added a regression test feeding {dict,None,str,int} asserting the path stays fail-closed (non-zero exit) and still surfaces string entries.

- runner.py: str()-coerce schema-error entries before join so a non-string entry
  (None/dict/int) cannot raise TypeError and crash the schema-exhaustion
  fail-closed branch. +regression test (non-string + truncation cases).
- merge-to-main-pr.sh: the PR mergeability poll now fails CLOSED when `gh pr view`
  exits non-zero (404 deleted PR / 401 revoked-or-expired token / API outage) and
  leaves mergeability unresolved, instead of silently proceeding as if mergeable.
  A legitimately-UNKNOWN-then-timeout with gh rc 0 keeps the prior behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

DSO-Review-Cycle: 1 pr_number=684 commit_sha=b1f392e6e5a89c5c6f44acd8404a1c13dffe1f60 findings_hash=6bde88c02a853064 tuples=[["plugins/dso/docs/contracts/review-tristate-lattice.md", "", "correctness"], ["plugins/dso/scripts/dso_ci_review/runner.py", "", "correctness"], ["plugins/dso/scripts/dso_ci_review/runner.py", "", "maintainability"], ["plugins/dso/scripts/lib/review-tristate-lib.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["plugins/dso/scripts/merge-to-main-pr.sh", "", "correctness"], ["tests/scripts/test-merge-to-main-resume-inc008.sh", "", "verification"], ["tests/scripts/test-review-tristate-lib.sh", "", "verification"]]

# 78 PRECONDITION_NOT_MET cannot even start (no gh/token/repo) — a special
# INDETERMINATE; mode-gated by precondition-gate.sh

# Named exit codes (consumers should `exit "$TRISTATE_INDETERMINATE"` etc.).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 1/7

[critical] plugins/dso/scripts/lib/review-tristate-lib.sh:35 (correctness)

review-tristate-lib.sh exports TRISTATE_* constants unconditionally at the top level, but consumers unguard their usage. When review-coverage-invariant.sh sources the lib with a failing precondition (exit 78), the sourcing script exits immediately—correct. However, if the lib file itself contains a parse error (malformed bash syntax), the `source` command fails, $? is non-zero, the script exits 78—also correct. But if the lib exists and parses but the export statement fails for some reason (though rare), or if a transient environment-variable corruption occurs, $TRISTATE_INDETERMINATE may be unset at runtime. The downstream 'exit "$_verdict_code"' (line 202 of review-coverage-invariant.sh) or 'exit "${TRISTATE_INDETERMINATE:-75}"' (line 96 of llm-review-dispatch-or-skip.sh, line 3254 of merge-to-main-pr.sh) would produce a silent fallback to 75 due to the default expansion, making the exit code correct by accident rather than by design. This violates the load-bearing contract (docs/contracts/review-tristate-lattice.md lines 32–40): the tristate must be deterministic, not accidental. The fix: verify the constants are defined immediately after sourcing via a guard like '[[ -z "${TRISTATE_INDETERMINATE:-}" ]] && { echo PRECONDITION_NOT_MET...; exit 78; }', so unset constants fail explicitly rather than silently falling back.

Posted by dso_ci_review.runner; resolve this comment when addressed.

print('')
" 2>/dev/null || true)
[[ -n "$_mergeable" && "$_mergeable" != "UNKNOWN" ]] && break
[[ "$_mq_try" == "6" ]] || sleep 5 # don't sleep after the final attempt

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 2/7

[important] plugins/dso/scripts/merge-to-main-pr.sh:1539 plugins/dso/scripts/merge-to-main-pr.sh:1549 (correctness)

In merge-to-main-pr.sh, the mergeability polling loop (lines 1529–1546) contains a race-window logic error: after exiting the loop (either via break or exhaustion), the fail-closed check '[[ ( -z "$_mergeable" || "$_mergeable" == "UNKNOWN" ) && "$_mq_gh_rc" -ne 0 ]]' only blocks if the gh call itself failed (rc ≠ 0). This misses the case where the final poll attempt (attempt 6) succeeds (gh rc=0) but returns _mergeable="UNKNOWN" (GitHub still computing after 25 seconds). The loop break condition '[[ -n "$_mergeable" && "$_mergeable" != "UNKNOWN" ]]' is false for UNKNOWN, so the loop exits via natural completion, not break. After the loop, the fail-closed condition is false because _mq_gh_rc IS 0, so the code falls through to the CONFLICTING check, which is also false, and proceeds to enqueue auto-merge on a PR with unresolved mergeability. This is a determinate operational defect: after ~25 seconds of polling, if mergeability is still unknown, the code should fail closed (not attempt merge), but instead it silently proceeds. The fix: change the fail-closed condition to '[[ ( -z "$_mergeable" || "$_mergeable" == "UNKNOWN" ) ]]' (remove the '&& "$_mq_gh_rc" -ne 0' clause), so any unresolved mergeability (from either a failed call or a timeout) blocks.

Posted by dso_ci_review.runner; resolve this comment when addressed.

## Lattice rules (LOAD-BEARING — every consumer must honor all four)

1. **FAIL is the safe bottom.** A genuine violation resolves to FAIL and is never retried, downgraded, or reclassified as transient. When a gate sees both a violation and a compute-error, the violation **dominates** → FAIL.
2. **Indistinguishable → FAIL.** When PASS vs INDETERMINATE cannot be distinguished by evidence at decision time, resolve to FAIL. (`tristate_classify_verdict` coerces an unparseable error-count to ≥1 so it never silently becomes PASS.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 3/7

[important] plugins/dso/docs/contracts/review-tristate-lattice.md:32 (correctness)

The tristate contract (docs/contracts/review-tristate-lattice.md, section 'Lattice rules', rule 4) states 'INDETERMINATE retries ONLY on observably-transient cause' and references tristate_is_transient_error for classification. However, none of the three emitting gates (review-coverage-invariant.sh, llm-review-dispatch-or-skip.sh, merge-to-main-pr.sh) call tristate_is_transient_error to classify errors before escalating. They directly escalate all INDETERMINATE verdicts to /dso:fp-recovery without determining whether the cause is transient and thus self-healing. This creates a contract/implementation gap: the contract promises orchestrator-level retry on observably-transient causes, but the gates do not implement that retry logic—they escalate immediately. A transient GitHub API 503 (which should self-heal with a delayed retry) is instead escalated to /dso:fp-recovery, forcing manual recovery for a condition that should have been automatically retried.

Posted by dso_ci_review.runner; resolve this comment when addressed.

# (checkout the feature branch) and exits 1, and is deliberately NOT routed to
# /dso:fp-recovery (fp-recovery cannot fix a wrong-branch checkout). The /tmp
# state cache for spent staged refs is GC'd so a later --resume starts clean.
if _branch_is_staged_promotion; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 4/7

[important] plugins/dso/scripts/merge-to-main-pr.sh:3356 plugins/dso/scripts/merge-to-main-pr.sh:3368 (correctness)

In merge-to-main-pr.sh lines 3368–3375, when the spent-staged-ref check determines the branch is spent (0 commits ahead), it emits a recovery message stating '(0 commits ahead of origin/${_DEFAULT_BRANCH:-main})'. However, if the git rev-list command fails (network error, permission denied), _cur_ahead is empty, and resume_staged_ref_is_spent returns 1 (not spent, fail-safe). The conditional on line 3368 is false, so the recovery message is NOT printed. But if the git command later succeeds with output '0' due to transient recovery, the conditional is true and the message claims the ref is spent with '0 commits ahead', when the operator may have just experienced a transient git failure. The recovery message causes the operator to checkout the feature branch without understanding that a transient git operation (network lag, momentary token expiry) might have been the root cause. The fix: if _cur_ahead is empty/unknown, do NOT assume the ref is spent—either escalate as INDETERMINATE (unknown state) or emit a different message explaining the git command failed and the operator should retry.

Posted by dso_ci_review.runner; resolve this comment when addressed.

# Load merge-to-main-pr.sh in lib mode (defines functions, returns before main flow).
export PR_LIB_MODE=1
# shellcheck source=/dev/null
source "$MERGE_PR" 2>/dev/null || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 5/7

[important] tests/scripts/test-merge-to-main-resume-inc008.sh:38 tests/scripts/test-merge-to-main-resume-inc008.sh:41 (verification)

Test file test-merge-to-main-resume-inc008.sh (lines 39–41) sources merge-to-main-pr.sh in lib-mode via 'export PR_LIB_MODE=1; source "$MERGE_PR"; unset PR_LIB_MODE'. The test checks that resume_pr_query_trustworthy exists ('type resume_pr_query_trustworthy'), but does NOT verify that lib-mode sourcing actually executed (i.e., that the script returned before the main flow). If lib-mode sourcing fails (e.g., the PR_LIB_MODE guard in merge-to-main-pr.sh is missing or broken), the script may run its main flow (which would create git refs, modify state), and the test would still pass if the function happened to be defined by a prior test run or environment variable. This is a false-positive risk: the test asserts function presence but not function correctness under the lib-mode contract. The fix: add a precondition assertion that merge-to-main-pr.sh contains the PR_LIB_MODE guard (e.g., grep for 'if.*PR_LIB_MODE.*return'), or verify that sourcing does NOT execute main code (e.g., by checking that 'source "$MERGE_PR" 2>&1' produces no git operations or output from the script body).

Posted by dso_ci_review.runner; resolve this comment when addressed.

_SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$_SCRIPT_DIR/../.." && pwd)"
LIB="$REPO_ROOT/plugins/dso/scripts/lib/review-tristate-lib.sh" # shim-exempt: sources the lib under test
# shellcheck disable=SC1090

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 6/7

[important] tests/scripts/test-review-tristate-lib.sh:30 tests/scripts/test-review-tristate-lib.sh:36 (verification)

Test file test-review-tristate-lib.sh (lines 30–36) sources review-tristate-lib.sh and checks that functions are defined (tristate_classify_verdict, tristate_is_transient_error, tristate_code_to_name), but does NOT verify that the contract constants (TRISTATE_PASS=0, TRISTATE_FAIL=1, TRISTATE_INDETERMINATE=75, TRISTATE_PRECONDITION_NOT_MET=78) are exported and have the correct values. Production code relies on these constants (e.g., 'exit "${TRISTATE_INDETERMINATE:-75}"' in merge-to-main-pr.sh, llm-review-dispatch-or-skip.sh). If the lib fails to export these constants (e.g., due to a typo, missing export statement, or accidental unset), the test passes but production code silently falls back to the default value (75), masking the missing constant. A later change to a constant value (e.g., TRISTATE_INDETERMINATE moved from 75 to 76) would not be caught by this test, causing a silent mismatch between the lib and callers. The fix: add explicit assertions that each constant is defined and has the expected integer value before running behavioral tests.

Posted by dso_ci_review.runner; resolve this comment when addressed.

# typed list[str], but a defensive guard prevents a non-string
# entry (None/dict/int) from raising TypeError in join and
# crashing this fail-closed path.
_err_preview = "; ".join(str(_e) for _e in _err_lines[:10])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSO llm-review — finding 7/7

[important] plugins/dso/scripts/dso_ci_review/runner.py:3561 plugins/dso/scripts/dso_ci_review/runner.py:3565 (correctness)

In dso_ci_review/runner.py lines 3565–3568, the schema-exhaustion error-detail formatting converts _schema_result.errors entries via str()-coercion. The comment notes 'a defensive guard prevents a non-string entry (None/dict/int) from raising TypeError'. However, the defensive guard is the str() coercion itself, which succeeds but produces misleading output: str(None) -> 'None', str({...}) -> '{...}' (dict repr, not JSON), str(42) -> '42'. These are not user-friendly error messages. More critically, if _schema_result.errors is not a list (e.g., a bare string, None, or a dict), the slice _err_lines[:10] and the iteration 'for _e in _err_lines[:10]' fail silently or produce incorrect output. A string '"field: invalid"' sliced as [:10] produces a substring (a partial error), not the full error. The code should validate that _err_lines is a list, and should format non-string entries as JSON or raise them as diagnostic strings. The fix: check 'isinstance(_err_lines, list)' before slicing, and handle non-list values by emitting a diagnostic ('errors field is malformed') instead of silently attempting to iterate.

Posted by dso_ci_review.runner; resolve this comment when addressed.

@gitar-bot

gitar-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown
CI failed: The CI pipeline correctly blocked the merge because the automated LLM code reviewer identified 1 critical and 6 important findings in your changes. These findings act as quality gates that must be resolved before the code can proceed.

Overview

The build failed during the llm-review stage. The automated reviewer identified multiple blocking issues (1 critical, 6 important) in the code, which triggered a deliberate exit code 1 to enforce quality standards.

Failures

LLM Quality Gate Violation (confidence: high)

  • Type: test
  • Affected jobs: 79845448379
  • Related to change: yes
  • Root cause: The PR contains code patterns that do not meet the project's quality or security thresholds as defined by the automated CI review tool.
  • Suggested fix: Review the specific issues highlighted in the findings.json artifact and the 7 PR comments posted by the review bot. Address the identified critical and important violations and push a new commit.

Summary

  • Change-related failures: 1 (automated quality gate failure).
  • Infrastructure/flaky failures: 0.
  • Recommended action: Consult the PR review comments generated by the CI bot and the findings.json file to identify the specific lines of code requiring remediation.
Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Integrates resilient three-valued verdict logic and garbage collection to harden the two-tier staging flow. Sanitize backslashes and control characters in the tristate_indeterminate_escalation JSON marker to prevent malformed output.

💡 Edge Case: Escalation JSON marker not sanitized against backslashes/control chars

📄 plugins/dso/scripts/lib/review-tristate-lib.sh:116-124

tristate_indeterminate_escalation builds a JSON marker by only replacing double-quotes (//"/\') and newlines in the gate/reason fields. A reason containing a backslash (e.g. \x) or other control characters (tab, carriage return) would produce invalid JSON, breaking the machine-routing/telemetry parse that the marker exists to support (the test T8b/T8c only exercises quotes). In practice this is low-risk because reasons are constructed from hardcoded gate labels and git branch/ref names (which cannot contain backslashes), so I'm flagging it as minor. If you want the marker to be unconditionally parseable regardless of caller input, build the JSON with python3 -c 'import json,sys; ...' (or jq) rather than manual string substitution.

Source: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124.

Serialize the marker with json.dumps so any caller-supplied text (backslashes, tabs, quotes) is always valid JSON.
    local _next="/dso:fp-recovery${_ref:+ $_ref}"
    printf 'INDETERMINATE_ESCALATION: %s
' \
        "$(_gate="$_gate" _reason="$_reason" _next="$_next" python3 -c '
import json, os
print(json.dumps({"gate": os.environ["_gate"], "reason": os.environ["_reason"],
                 "next_action": os.environ["_next"], "recovery": "in-channel"}))')" >&2
🤖 Prompt for agents
Code Review: Integrates resilient three-valued verdict logic and garbage collection to harden the two-tier staging flow. Sanitize backslashes and control characters in the `tristate_indeterminate_escalation` JSON marker to prevent malformed output.

1. 💡 Edge Case: Escalation JSON marker not sanitized against backslashes/control chars
   Files: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124

   `tristate_indeterminate_escalation` builds a JSON marker by only replacing double-quotes (`//"/\'`) and newlines in the gate/reason fields. A reason containing a backslash (e.g. `\x`) or other control characters (tab, carriage return) would produce invalid JSON, breaking the machine-routing/telemetry parse that the marker exists to support (the test `T8b`/`T8c` only exercises quotes). In practice this is low-risk because reasons are constructed from hardcoded gate labels and git branch/ref names (which cannot contain backslashes), so I'm flagging it as minor. If you want the marker to be unconditionally parseable regardless of caller input, build the JSON with `python3 -c 'import json,sys; ...'` (or `jq`) rather than manual string substitution.
   
   Source: plugins/dso/scripts/lib/review-tristate-lib.sh:116-124.

   Fix (Serialize the marker with json.dumps so any caller-supplied text (backslashes, tabs, quotes) is always valid JSON.):
       local _next="/dso:fp-recovery${_ref:+ $_ref}"
       printf 'INDETERMINATE_ESCALATION: %s
   ' \
           "$(_gate="$_gate" _reason="$_reason" _next="$_next" python3 -c '
   import json, os
   print(json.dumps({"gate": os.environ["_gate"], "reason": os.environ["_reason"],
                    "next_action": os.environ["_next"], "recovery": "in-channel"}))')" >&2

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@joeoakhart joeoakhart added the force-merged-fp-recovery PR was force-merged via /dso:fp-recovery workflow label Jun 6, 2026
@joeoakhart

Copy link
Copy Markdown
Collaborator Author

FP_RECOVERY: pr_number=684 manual_reviewer_hash=926898d31662fcdb15b45b884eb5d1e963018db92b362cb579ac71610ef95c8b fp_rationale="CI review-sub-pr escalated to 1 critical + 6 important that re-litigated just-made fixes (polling guard, str() coercion) AND flagged previously-panel-approved 3ebb code (tristate lib/contract/tests) — FP-amplification / non-convergence. A NEUTRAL (context-free) dso:code-reviewer-standard at opus tier over the full diff found 0 findings (12 tool calls / 103s), examining the exact flagged areas."

Cleared: 0 critical / 0 important / 0 fragile. All other required checks green; only review-sub-pr (the FP-amplified check) bypassed. Per FP-RECOVERY-WORKFLOW Step 5 — human web-UI bypass merge required.

@JoeOakhartNava JoeOakhartNava disabled auto-merge June 6, 2026 03:06
@JoeOakhartNava JoeOakhartNava merged commit 2a02101 into staged-793b29ce7031-1780709220 Jun 6, 2026
18 of 19 checks passed
@joeoakhart joeoakhart deleted the story/588e-abec-221f-42c0/3ebb-resilience-dd123-6 branch June 6, 2026 04:26
joeoakhart pushed a commit that referenced this pull request Jun 8, 2026
…2195

review-convergence-check.sh + review-finding-identity.sh were built and unit-tested
but invoked NOWHERE, so the review loop had no oscillation / cycle-cap brake. The
2026-06-08 FP-analysis tied the largest *systemic* FP amplifier to this: review-sub-pr
re-rolling different findings each cycle on the same code and re-flagging already-fixed
/ panel-approved lines (PRs #633/#641/#643/#647/#649/#684 — "FP-amplification, bug 2195").

Wires the check into the ci.yml llm-review job in ADVISORY mode (warn-only, NOT in
required-checks.txt) per the workflow-stability warn-first convention. Per-PR convergence
history persists across review cycles via an accumulating actions/cache (unique key per
run, restored from the latest same-PR prefix); the step emits a ::warning:: on
oscillation / cap and never blocks. Admin go-live = make the step fail on exit 1 + add to
required-checks.

Active re-flag suppression already exists via the defense store
(DSO_SUPPRESS_PRIOR_DEFENSES); this adds the missing detection brake. actionlint clean.
Regression guard: tests/workflows/test-review-convergence-wired.sh (3/3) asserts the
wiring + script presence so 2195 ("wired nowhere") cannot recur. Status updated in
WORKFLOW-STABILITY-CHECKS.md.

Epic 7412. Relates-to bug 2195-a056-e5e1-4743.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joeoakhart pushed a commit that referenced this pull request Jun 8, 2026
…2195

review-convergence-check.sh + review-finding-identity.sh were built and unit-tested
but invoked NOWHERE, so the review loop had no oscillation / cycle-cap brake. The
2026-06-08 FP-analysis tied the largest *systemic* FP amplifier to this: review-sub-pr
re-rolling different findings each cycle on the same code and re-flagging already-fixed
/ panel-approved lines (PRs #633/#641/#643/#647/#649/#684 — "FP-amplification, bug 2195").

Wires the check into the ci.yml llm-review job in ADVISORY mode (warn-only, NOT in
required-checks.txt) per the workflow-stability warn-first convention. Per-PR convergence
history persists across review cycles via an accumulating actions/cache (unique key per
run, restored from the latest same-PR prefix); the step emits a ::warning:: on
oscillation / cap and never blocks. Admin go-live = make the step fail on exit 1 + add to
required-checks.

Active re-flag suppression already exists via the defense store
(DSO_SUPPRESS_PRIOR_DEFENSES); this adds the missing detection brake. actionlint clean.
Regression guard: tests/workflows/test-review-convergence-wired.sh (3/3) asserts the
wiring + script presence so 2195 ("wired nowhere") cannot recur. Status updated in
WORKFLOW-STABILITY-CHECKS.md.

Epic 7412. Relates-to bug 2195-a056-e5e1-4743.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

force-merged-fp-recovery PR was force-merged via /dso:fp-recovery workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants