Skip to content

fix(review-gate): require review_completed attestation on empty-findings payloads (1b76)#693

Merged
JoeOakhartNava merged 4 commits into
mainfrom
fix/1b76-empty-findings-failopen
Jun 6, 2026
Merged

fix(review-gate): require review_completed attestation on empty-findings payloads (1b76)#693
JoeOakhartNava merged 4 commits into
mainfrom
fix/1b76-empty-findings-failopen

Conversation

@joeoakhart

@joeoakhart joeoakhart commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

1b76 — require positive review_completed attestation on empty-findings payloads

Fixes bug 1b76-0a0f-9779-4eda (588e panel O5-C2). Closes a fail-open where validate-review-output.sh accepted {findings:[], summary>=10chars} as a clean review — byte-equivalent to a truncated/garbled payload — letting un-reviewed code merge. This fail-open gates the 3ee4 enforce-flip.

Fix: top-level boolean review_completed attestation; validator requires it === true when findings is empty AND the payload is non-synthetic (synthetic infra payloads keep their type channel; non-empty findings are themselves the positive signal). Emitted via the shared reviewer-base.md (12 agents + ci variants regenerated via build-review-agents.sh); merge_findings() sets it for the runner's aggregated clean reviews. Contract doc + tests updated (suite 100→103; full dso_ci_review pytest 898 passed).

Why direct PR + manual bypass-merge

Per epic 588e handoff: the two-tier staged flow livelocks (review-sub-pr/llm-review FP-amplifies). Landing recipe = direct PR to main + a neutral local opus review before merge (the real gate — especially apt here since this change touches the review machinery itself) + human web-UI bypass-merge with all other required checks green. Approved by maintainer (Fix A via shared base). The neutral opus review is run pre-merge and attached as the clearance artifact.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced code review validation to require explicit completion attestation in reviewer outputs, improving distinction between complete reviews and incomplete or truncated payloads.
  • Documentation

    • Updated code reviewer agent documentation and output schema specifications to reflect stricter validation requirements.
  • Tests

    • Updated test fixtures and validation tests to verify compliance with the new review output schema.

…pty-findings payloads (1b76)

validate-review-output.sh accepted a code-review-dispatch payload with empty
findings + a >=10-char summary as SCHEMA_VALID — byte-equivalent to a truncated/
garbled payload — so a stripped reviewer response could pass as a clean no-issues
review and merge un-reviewed code (588e panel O5-C2; fail-open gating the 3ee4
enforce-flip).

Fix: add a top-level boolean `review_completed` attestation.
- validate-review-output.sh: allowlist review_completed; require it === true when
  findings is an empty list AND the payload is non-synthetic (synthetic infra
  payloads keep their positive `type` channel; non-empty findings are themselves
  the positive signal). Schema hash cb48a66f -> 35dd40c9.
- reviewer-base.md (shared base): reviewers now emit review_completed:true as a
  required top-level key; the 12 code-reviewer-*.md agents (+ ci variants)
  regenerated via build-review-agents.sh.
- dso_ci_review/findings.py: merge_findings() sets review_completed:true (the
  runner's positive completion attestation for aggregated clean reviews);
  runner.py schema-hash constant + dispatch.py doc comment updated in lockstep.
- contract doc + tests updated; new RED tests cover empty-without-attestation
  (reject), empty-with-attestation (pass), non-empty (unaffected), synthetic
  (exempt). Suite 100 -> 103; full dso_ci_review pytest 898 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@joeoakhart, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 24 minutes and 53 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e3906bec-72cf-43c5-87b5-4605d56b6b84

📥 Commits

Reviewing files that changed from the base of the PR and between ba43430 and e7082e9.

📒 Files selected for processing (2)
  • plugins/dso/scripts/dso_ci_review/runner.py
  • tests/skills/dso_ci_review/test_empty_findings_review_completed_integration.py

Walkthrough

This PR introduces a review_completed: true boolean field as a required top-level key in the reviewer-findings JSON schema. The change aims to distinguish valid empty reviews from truncated or garbled payloads, with exemptions for synthetic error findings. Validator logic, agent markdown documentation, Python merge functions, and test fixtures are updated in concert.

Changes

Schema enforcement and validation

Layer / File(s) Summary
Schema contract definition
plugins/dso/docs/contracts/review-findings-schema.md
The reviewer-findings.json schema adds review_completed as a required boolean with conditional enforcement: review_completed: true is required when findings is an empty list and the payload is non-synthetic (exempting synthetic error types). Change log entry added for 2026-06-06.
Validator implementation
plugins/dso/scripts/validate-review-output.sh
Bash validator updated to enforce review_completed as a boolean, reject non-boolean values, require review_completed: true for empty real findings, and exempt synthetic payloads. Schema hash constant updated to 35dd40c9e6e83207. Documentation and usage text expanded to clarify the attestation semantics.
Merge function and runner sync
plugins/dso/scripts/dso_ci_review/findings.py, plugins/dso/scripts/dso_ci_review/runner.py, plugins/dso/scripts/dso_ci_review/dispatch.py
merge_findings() now emits review_completed: True in its output schema. Hash constants in runner.py and dispatch.py synced to match the updated validator schema.

Agent documentation updates

Layer / File(s) Summary
CI agent schema requirements
plugins/dso/agents/ci/code-reviewer-*.md (10 files)
All CI-tier reviewer agents (deep-arch, deep-correctness, deep-hygiene, deep-verification, light, performance, security-blue-team, security-red-team, standard, test-quality) updated with schema documentation requiring review_completed: true, content hash markers refreshed, and example payloads modified to include the field.
Base agent schema requirements
plugins/dso/agents/code-reviewer-*.md (10 files)
All base-tier reviewer agents (deep-arch, deep-correctness, deep-hygiene, deep-verification, light, performance, security-blue-team, security-red-team, standard, test-quality) updated with schema documentation and examples reflecting the new required review_completed field.
Base documentation schema guidance
plugins/dso/docs/workflows/prompts/reviewer-base.md
Workflow documentation contract updated to require review_completed as a top-level key and clarify validator rejection of payloads omitting this field.

Test coverage

Layer / File(s) Summary
Validator attestation tests
tests/hooks/test-validate-review-output.sh
New test suite (bug 1b76) covers review_completed validator behavior: rejects empty findings without attestation, accepts when true, rejects non-boolean values, allows non-empty findings without attestation, and exempts synthetic parse_error payloads. Existing fixtures updated to include review_completed: true.
Documentation contract tests
tests/docs/test-reviewer-base-file-constraint.sh
Schema contract test updated from "exactly two required keys" to "exactly three required keys" (findings, summary, review_completed) and verifies the attestation field is present in documentation examples.
Merge function tests
tests/scripts/test-write-reviewer-findings.sh
All test fixtures and inline JSON payloads updated to include review_completed: true, validating that the merge function correctly emits the new attestation field.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • navapbc/digital-service-orchestra#109: Adds schema-validation hook in runner.py that runs validate-review-output.sh using the same hash-drift logic updated in this PR's hash constant synchronization.
  • navapbc/digital-service-orchestra#113: Wires schema-validation failures into dispatch_schema_correction() retry path within runner.py, directly consuming the stricter output contract enforced by this PR.
  • navapbc/digital-service-orchestra#496: Reorganizes reviewer-agent markdown files into dispatch variants and updates the build/dispatch pipeline; this PR's schema updates to those same files are sequential and interdependent on the shared infrastructure.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a required review_completed attestation for empty-findings payloads to fix bug 1b76. It is specific, concise, and directly summarizes the primary fix.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/1b76-empty-findings-failopen

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread plugins/dso/scripts/dso_ci_review/runner.py Fixed
Comment on lines +468 to +479
if isinstance(findings, list):
_synthetic_types = {"specialist_error", "fallback_exhausted", "parse_error"}
_is_synthetic_payload = any(
isinstance(f, dict) and f.get("type") in _synthetic_types for f in findings
)
if not findings and not _is_synthetic_payload:
if data.get("review_completed") is not True:
errors.append(
"clean review (empty findings) must set review_completed: true — a positive "
"review-completed attestation is required to distinguish a completed no-issues "
"review from a truncated/garbled payload (1b76)"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Redundant synthetic-payload check in empty-findings enforcement

In validate-review-output.sh, the empty-findings enforcement computes _is_synthetic_payload = any(... for f in findings) and then gates on if not findings and not _is_synthetic_payload. When not findings is true the list is empty, so any(...) over it is always False and _is_synthetic_payload can never be True — the synthetic-payload term is dead code in this branch. A synthetic payload always carries at least one finding entry (with type), making findings non-empty, so it is already excluded by not findings. The extra computation is harmless but misleading: it suggests an empty synthetic payload would be exempted when in fact no such case exists. Consider dropping the _is_synthetic_payload term here (or adding a comment that it is intentionally defensive) to avoid implying behavior that the condition cannot exercise.

Was this helpful? React with 👍 / 👎

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (17)
plugins/dso/agents/ci/code-reviewer-deep-arch.md (1)

719-720: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove stale “exactly 2 top-level keys” statements to avoid schema-contract contradiction.

This file now requires review_completed, but these lines still say the schema enforces exactly two top-level keys (findings, summary). That contradiction can produce inconsistent agent outputs and unnecessary re-dispatches.

Also applies to: 762-762

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-deep-arch.md` around lines 719 - 720,
Update the documentation text that claims the JSON schema requires "exactly 2
top-level keys (findings, summary)" by removing or replacing that stale
assertion and any wording that contradicts the current schema; specifically
remove the phrase "exactly 2 top-level keys" and update references about the
deprecated "scores" key so the doc reflects the current required top-level keys
including "review_completed" (and still "findings" and "summary") to avoid
schema-contract contradictions and re-dispatch loops.
plugins/dso/agents/ci/code-reviewer-deep-correctness.md (1)

753-753: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update stale validator note to include review_completed.

Line 753 still says validation enforces 2 required top-level keys, which conflicts with the updated schema section requiring review_completed. Please align this line to prevent instruction ambiguity for the CI reviewer agent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-deep-correctness.md` at line 753, Update
the stale sentence that claims validate-review-output.sh enforces only 2
required top-level keys to reflect the current schema by including the
now-required review_completed key; specifically edit the line that mentions "2
required top-level keys (findings, summary)" to state the required keys are
findings, summary, and review_completed (and note that scores is deprecated) so
the guidance and CI validation expectations (validate-review-output.sh) are
consistent with the schema; ensure the example summary text guidance (e.g.,
"security_overlay_warranted: no, performance_overlay_warranted: yes") remains
unchanged.
plugins/dso/agents/ci/code-reviewer-deep-hygiene.md (1)

647-648: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix leftover “2 required top-level keys” guidance in this CI agent prompt.

These lines contradict the new schema requirement that includes review_completed. Keeping both rules in one prompt can cause malformed reviewer output contracts.

Also applies to: 704-704

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-deep-hygiene.md` around lines 647 - 648,
Update the prompt text that still instructs "2 required top-level keys
(findings, summary)" and references the deprecated "scores" key: remove or
replace that sentence so the guidance matches the current schema which requires
the top-level keys findings, summary, and review_completed; also remove
references to the deprecated "scores" key and ensure the finding categories enum
description still lists the five allowed values (and nothing contradictory).
Apply this change to both occurrences of the outdated guidance (the block
containing the literal text "2 required top-level keys" and any nearby mention
of "scores" / finding category enum) so the CI agent prompt and reviewer output
contract are consistent with the new schema.
plugins/dso/agents/ci/code-reviewer-deep-verification.md (1)

648-648: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align this validator sentence with the new review_completed requirement.

Line 648 still documents a 2-key requirement, which conflicts with the updated schema contract in this same file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-deep-verification.md` at line 648, Update
the validator sentence that currently asserts a 2-key requirement (findings,
summary) to reflect the new schema by adding the required review_completed key:
state that the output MUST include findings, summary and review_completed
(summary field text should still contain items like "security_overlay_warranted:
no, performance_overlay_warranted: yes"), note that scores is deprecated, and
keep the reference to validate-review-output.sh enforcing the contract; edit the
sentence mentioning "summary field" on line 648 to list these three required
top-level keys (findings, summary, review_completed) instead of two.
plugins/dso/agents/ci/code-reviewer-performance.md (1)

473-500: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update CI JSON example/checklist to include review_completed.

The Step 3 payload example and Step 4 required-fields checklist still model a two-key contract, conflicting with the updated schema rules above. This inconsistency will cause avoidable re-dispatch loops when the example is followed literally.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-performance.md` around lines 473 - 500,
Update the example JSON and the Step 4 checklist in
plugins/dso/agents/ci/code-reviewer-performance.md to include the new required
top-level boolean field review_completed alongside summary and findings;
specifically add "review_completed": true/false to the example object and update
the textual checklist to require review_completed, and ensure
validate-review-output.sh (the dispatcher validator) is aware/consistent with
the new field name so the example and checklist no longer contradict the schema.
plugins/dso/agents/ci/code-reviewer-security-red-team.md (1)

563-566: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Red-team output-schema section still documents the legacy two-key contract.

The delta says only findings and summary are required, but the file’s base contract now requires review_completed: true. This mismatch is a functional contract defect in reviewer instructions and should be corrected in the delta block too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-security-red-team.md` around lines 563 -
566, The Output Schema delta currently states only findings and summary are
required but the base contract also requires review_completed: true; update the
"Output Schema" section so the required top-level keys include review_completed
(e.g., "Your output MUST conform... (3 required top-level keys: findings,
summary, review_completed)"), and mention that each output must set
review_completed: true alongside findings and summary to align the delta with
the base contract and prevent a contract mismatch.
plugins/dso/agents/ci/code-reviewer-security-blue-team.md (1)

569-572: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Blue-team delta schema is now out of sync with base contract.

This section still defines the legacy two-key top-level schema, which conflicts with the required review_completed: true contract defined earlier in the same file. Please align the delta Output Schema text with the new validator contract to prevent invalid reviewer payloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-security-blue-team.md` around lines 569 -
572, Update the "Output Schema" delta to remove the outdated "two-key top-level
schema" wording and explicitly match the validator contract that requires
review_completed: true; specifically, change the text under "Output Schema" to
state the required top-level keys are findings, summary and review_completed
(and that the legacy scores key is deprecated and must not be emitted), and
ensure the per-finding allowlist note remains unchanged so that each finding
uses only the five permitted fields; reference the "Output Schema" section and
the earlier "review_completed: true" contract when editing the wording.
plugins/dso/agents/ci/code-reviewer-light.md (1)

630-656: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve stale “two-key schema” instructions in the final output section.

This section still tells the agent to emit only findings and summary, which now contradicts the required review_completed: true contract earlier in the same file. In CI mode this creates deterministic validator failures on empty-findings outputs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-light.md` around lines 630 - 656, The
"Final Output Reminder" text incorrectly insists only on top-level keys
"findings" and "summary", which conflicts with the earlier required
"review_completed: true" contract and causes CI validator failures; update the
final-output guidance to require the exact top-level keys expected by CI
(include "review_completed": true alongside "findings" and "summary"), remove or
reword the outdated "two-key schema" directive, and add a short note that
emitting any other top-level keys (except an allowed deprecated "scores") will
fail validate-review-output.sh so authors must include "review_completed: true"
when appropriate.
plugins/dso/agents/ci/code-reviewer-standard.md (1)

469-500: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix CI JSON example/checklist to include review_completed

Step 3’s canonical JSON and Step 4 required-fields checklist still describe a 2-key payload (summary, findings) and omit review_completed, which now contradicts the schema rules above and the validator behavior. This can drive invalid agent output and re-dispatch loops.

Suggested doc fix
 {
   "summary": "<one-paragraph overall assessment>",
+  "review_completed": true,
   "findings": [
@@
-2. **Did you include all required schema fields?** `summary` is required; each finding requires `severity`, `category`, `file`, `description`. Field names match Step 3's schema exactly — do NOT substitute `dim` for `category`, `title` for `description`, or `line` for `cited_lines`.
+2. **Did you include all required schema fields?** `summary` and `review_completed` are required; each finding requires `severity`, `category`, `file`, `description`. Field names match Step 3's schema exactly — do NOT substitute `dim` for `category`, `title` for `description`, or `line` for `cited_lines`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-standard.md` around lines 469 - 500, The
CI JSON example and final-checklist in
plugins/dso/agents/ci/code-reviewer-standard.md still show only "summary" and
"findings" and omit the required "review_completed" field; update the canonical
JSON example in Step 3 and the Step 4 required-fields checklist to include
"review_completed" (boolean) and show it in the single-object example, adjust
any prose that lists required fields (e.g., the Step 4 checks and the
"Field-name discipline" paragraph) to mention "review_completed", and add one
brief example/value for "review_completed" in the example JSON so the validator
examples match actual validator expectations.
plugins/dso/agents/code-reviewer-deep-arch.md (1)

755-756: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove stale “exactly 2 top-level keys” language

These lines still claim only findings/summary are required, which now conflicts with the updated review_completed requirement in the same file and validator contract. Please align both references to the 3-key requirement.

Also applies to: 798-799

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/code-reviewer-deep-arch.md` around lines 755 - 756,
Replace the stale phrase "exactly 2 top-level keys (findings, summary)" with the
correct 3-key requirement by stating something like "exactly 3 top-level keys
(findings, summary, review_completed)"; update the same wording elsewhere in the
file where "exactly 2 top-level keys" appears (the later occurrence around the
other paragraph) so both references match the validator/contract and explicitly
list the three keys.
plugins/dso/agents/ci/code-reviewer-test-quality.md (1)

654-657: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update test-quality Output Schema to 3 required top-level keys

This section still instructs a 2-key payload, conflicting with the updated required review_completed contract. Keeping this stale text will cause schema-invalid outputs from this agent.

Suggested doc fix
-Your output MUST conform to the standard reviewer-findings.json schema (2 required top-level keys: findings, summary).
+Your output MUST conform to the standard reviewer-findings.json schema (3 required top-level keys: findings, summary, review_completed).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/dso/agents/ci/code-reviewer-test-quality.md` around lines 654 - 657,
The "Output Schema" section still says the reviewer-findings.json requires 2
top-level keys; update the prose under the "## Output Schema" heading to reflect
the new contract requiring 3 required top-level keys (findings, summary,
review_completed), and explicitly note that review_completed is required and
must follow the updated contract so generated outputs remain schema-valid;
ensure any examples or wording that assert "2 required top-level keys" are
changed to say "3 required top-level keys" and add a brief sentence about the
expected type/format of review_completed.
plugins/dso/agents/code-reviewer-light.md (1)

714-721: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the light-tier final-output reminder to include review_completed.

Line 714 onward still instructs agents to emit only findings and summary, which conflicts with the updated required schema in Line 166 (review_completed mandatory). This will produce invalid payloads for clean reviews and fail validation.

Proposed fix
-Your JSON output MUST use these exact top-level key names:
+Your JSON output MUST use these exact top-level key names:

 ```json
-{ "findings": [...], "summary": "..." }
+{ "findings": [...], "summary": "...", "review_completed": true }

-Required top-level keys are findings and summary. The legacy scores key is deprecated — do not emit it. The validator tolerates scores with a deprecation warning but rejects any other top-level key name (e.g., "dimensions", "ratings").
+Required top-level keys are findings, summary, and review_completed (always true). The legacy scores key is deprecated — do not emit it. The validator tolerates scores with a deprecation warning but rejects any other top-level key name (e.g., "dimensions", "ratings").

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/dso/agents/code-reviewer-light.md around lines 714 - 721, Update the
light-tier final-output reminder so the required top-level JSON keys include
"review_completed" (always true) in addition to "findings" and "summary"; change
the example output to { "findings": [...], "summary": "...", "review_completed":
true } and update the descriptive sentence to say "Required top-level keys are
findings, summary, and review_completed (always true)." Keep the note
about the deprecated scores key tolerated with a warning and ensure no other
top-level keys (e.g., "dimensions", "ratings") are permitted.


</details>

<!-- cr-comment:v1:955f3f29d5d972b64b7f89d1 -->

</blockquote></details>
<details>
<summary>plugins/dso/agents/code-reviewer-security-blue-team.md (1)</summary><blockquote>

`607-618`: _⚠️ Potential issue_ | _🟠 Major_ | _⚡ Quick win_

**Tier-local schema block is stale and contradicts the enforced contract.**

This section still says “2 required top-level keys” and a reduced finding allowlist, which conflicts with the updated base contract in this same file (and validator behavior) that requires `review_completed` semantics. Please align this block to the current schema so blue-team agents don’t emit conflicting payload shapes.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/dso/agents/code-reviewer-security-blue-team.md around lines 607 -
618, Update the stale tier-local schema block so it matches the current enforced
contract: replace the old statement about "2 required top-level keys
(findings, summary)" and the reduced allowlist with the active schema that
includes review_completed semantics and the full required payload shape;
ensure the allowlisted finding fields and required cited_lines rule remain
consistent with the base validator and reference the same keys used elsewhere in
this file (e.g., findings, summary, review_completed, cited_lines) so
blue-team agents emit the correct JSON shape.


</details>

<!-- cr-comment:v1:826b84c555f5f203b7389879 -->

</blockquote></details>
<details>
<summary>plugins/dso/agents/code-reviewer-security-red-team.md (1)</summary><blockquote>

`601-602`: _⚠️ Potential issue_ | _🟠 Major_ | _⚡ Quick win_

**Red-team output schema section still documents the old 2-key contract.**

This local section conflicts with the updated base schema in the same file (which now includes `review_completed` requirements). Please update this block to the current contract to prevent instruction ambiguity and invalid reviewer payloads.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/dso/agents/code-reviewer-security-red-team.md around lines 601 -
602, The documentation block that enforces the "2 required top-level keys:
findings, summary" in plugins/dso/agents/code-reviewer-security-red-team.md must
be updated to match the current reviewer-findings.json contract — modify the
text that describes the required top-level keys so it includes the additional
required key review_completed and any updated allowed fields/constraints (retain
the allowlist restriction for finding fields: severity, category, description,
file, cited_lines and the prefix requirement for descriptions), remove or revise
the outdated sentence about the "2-key contract", and ensure the block
references the exact schema keys findings, summary, and review_completed to
avoid producing invalid reviewer payloads.


</details>

<!-- cr-comment:v1:a9a8c1d461239cfb73e05335 -->

</blockquote></details>
<details>
<summary>plugins/dso/agents/code-reviewer-standard.md (1)</summary><blockquote>

`924-925`: _⚠️ Potential issue_ | _🟠 Major_ | _⚡ Quick win_

**Schema references in later sections still mention a 2-key top-level contract.**

These lines now contradict the updated schema block above that requires `review_completed`. Please update these references to avoid conflicting guidance inside the same agent definition.

   


Also applies to: 973-974

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/dso/agents/code-reviewer-standard.md around lines 924 - 925, Update
the text that currently reads "top-level scoring dimension — the JSON schema
enforces 2 required top-level keys (findings, summary); the legacy scores
key is deprecated." (and the similar reference later) so it reflects the new
contract: list the required top-level keys findings, summary, and
review_completed, note that scores is deprecated, and remove any assertation
that only two keys are required; keep the same wording style and adjust both
occurrences mentioned in the comment to avoid the conflicting guidance.


</details>

<!-- cr-comment:v1:dfe713d2c0ebfb97aa2f67b2 -->

</blockquote></details>
<details>
<summary>tests/hooks/test-validate-review-output.sh (2)</summary><blockquote>

`1479-1504`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Avoid masking `escalate_review` validation errors behind missing attestation.**

Both malformed `escalate_review` fixtures use empty findings without `review_completed`. That can short-circuit to attestation errors and make these assertions brittle/non-specific.

<details>
<summary>Suggested fix</summary>

```diff
 _ESC_STRING_FILE=$(write_fixture "escalate_review_string.json" '{
   "findings": [],
   "summary": "All dimensions clean but escalate_review is malformed as a string.",
+  "review_completed": true,
   "escalate_review": "uncertain severity"
 }')

 _ESC_MISSING_IDX_FILE=$(write_fixture "escalate_review_missing_index.json" '{
   "findings": [],
   "summary": "All dimensions clean but escalate_review element is missing finding_index.",
+  "review_completed": true,
   "escalate_review": [{"reason": "uncertain severity"}]
 }')

As per coding guidelines, “Tests must include edge-case and error-path coverage.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/hooks/test-validate-review-output.sh` around lines 1479 - 1504, The
tests are masking escalate_review validation errors because both fixtures use an
empty "findings" array so attestation/`review_completed` checks short-circuit;
update the fixtures created by write_fixture for _ESC_STRING_FILE and
_ESC_MISSING_IDX_FILE to include at least one minimal finding object (e.g.,
include a finding with a finding_index like 0 and an explicit review_completed
value) so the code path reaches escalate_review validation (affecting the
asserts that read _ESC_STRING_OUTPUT and the _snapshot_fail check) and the
specific type/missing-field errors surface.

Source: Coding guidelines


1362-1368: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate review_tier enum failure from attestation failure.

This negative test can pass for the wrong reason because findings is empty and review_completed is omitted. If enum validation regresses, the test may still stay green due to the attestation gate.

Suggested fix
 _REVIEW_TIER_INVALID_FILE=$(write_fixture "review_tier_invalid.json" '{
   "findings": [],
   "summary": "No issues found. Code is well-structured and tests are adequate.",
-  "review_tier": "ultra_deep"
+  "review_tier": "ultra_deep",
+  "review_completed": true
 }')

As per coding guidelines, “Tests must exercise observable behavior of new code paths” and “Tests must include edge-case and error-path coverage.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/hooks/test-validate-review-output.sh` around lines 1362 - 1368, The
test fixture review_tier_invalid.json currently has empty "findings" and omits
"review_completed", so the negative test (run_script code-review-dispatch ->
_REVIEW_TIER_INVALID_EXIT) can fail due to attestation gate instead of the
invalid review_tier enum; update the fixture used by
test_review_tier_invalid_value_fails to include a non-empty "findings" array and
set "review_completed": true (or otherwise ensure attestation passes) so the
script reaches enum validation and the invalid "review_tier": "ultra_deep" is
the observable failure being asserted.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/docs/test-reviewer-base-file-constraint.sh`:
- Around line 311-323: The current check uses keys_match and attest_match
separately and can misfire; change the logic to assert that the required-key
clause explicitly names findings, summary and review_completed together (e.g., a
single regex/search that matches
"required.*(findings).*?(summary).*?(review_completed)" or a pattern that finds
those three tokens in the same clause/line), replace the two-step keys_match +
attest_match approach with a single combined assertion, and update the condition
that sets "three_key_found" to only be true when that combined pattern matches
(references: keys_match, attest_match, and the schema_section handling that
prints/inspects "three_key_found").

---

Outside diff comments:
In `@plugins/dso/agents/ci/code-reviewer-deep-arch.md`:
- Around line 719-720: Update the documentation text that claims the JSON schema
requires "exactly 2 top-level keys (findings, summary)" by removing or replacing
that stale assertion and any wording that contradicts the current schema;
specifically remove the phrase "exactly 2 top-level keys" and update references
about the deprecated "scores" key so the doc reflects the current required
top-level keys including "review_completed" (and still "findings" and "summary")
to avoid schema-contract contradictions and re-dispatch loops.

In `@plugins/dso/agents/ci/code-reviewer-deep-correctness.md`:
- Line 753: Update the stale sentence that claims validate-review-output.sh
enforces only 2 required top-level keys to reflect the current schema by
including the now-required review_completed key; specifically edit the line that
mentions "2 required top-level keys (findings, summary)" to state the required
keys are findings, summary, and review_completed (and note that scores is
deprecated) so the guidance and CI validation expectations
(validate-review-output.sh) are consistent with the schema; ensure the example
summary text guidance (e.g., "security_overlay_warranted: no,
performance_overlay_warranted: yes") remains unchanged.

In `@plugins/dso/agents/ci/code-reviewer-deep-hygiene.md`:
- Around line 647-648: Update the prompt text that still instructs "2 required
top-level keys (findings, summary)" and references the deprecated "scores" key:
remove or replace that sentence so the guidance matches the current schema which
requires the top-level keys findings, summary, and review_completed; also remove
references to the deprecated "scores" key and ensure the finding categories enum
description still lists the five allowed values (and nothing contradictory).
Apply this change to both occurrences of the outdated guidance (the block
containing the literal text "2 required top-level keys" and any nearby mention
of "scores" / finding category enum) so the CI agent prompt and reviewer output
contract are consistent with the new schema.

In `@plugins/dso/agents/ci/code-reviewer-deep-verification.md`:
- Line 648: Update the validator sentence that currently asserts a 2-key
requirement (findings, summary) to reflect the new schema by adding the required
review_completed key: state that the output MUST include findings, summary and
review_completed (summary field text should still contain items like
"security_overlay_warranted: no, performance_overlay_warranted: yes"), note that
scores is deprecated, and keep the reference to validate-review-output.sh
enforcing the contract; edit the sentence mentioning "summary field" on line 648
to list these three required top-level keys (findings, summary,
review_completed) instead of two.

In `@plugins/dso/agents/ci/code-reviewer-light.md`:
- Around line 630-656: The "Final Output Reminder" text incorrectly insists only
on top-level keys "findings" and "summary", which conflicts with the earlier
required "review_completed: true" contract and causes CI validator failures;
update the final-output guidance to require the exact top-level keys expected by
CI (include "review_completed": true alongside "findings" and "summary"), remove
or reword the outdated "two-key schema" directive, and add a short note that
emitting any other top-level keys (except an allowed deprecated "scores") will
fail validate-review-output.sh so authors must include "review_completed: true"
when appropriate.

In `@plugins/dso/agents/ci/code-reviewer-performance.md`:
- Around line 473-500: Update the example JSON and the Step 4 checklist in
plugins/dso/agents/ci/code-reviewer-performance.md to include the new required
top-level boolean field review_completed alongside summary and findings;
specifically add "review_completed": true/false to the example object and update
the textual checklist to require review_completed, and ensure
validate-review-output.sh (the dispatcher validator) is aware/consistent with
the new field name so the example and checklist no longer contradict the schema.

In `@plugins/dso/agents/ci/code-reviewer-security-blue-team.md`:
- Around line 569-572: Update the "Output Schema" delta to remove the outdated
"two-key top-level schema" wording and explicitly match the validator contract
that requires review_completed: true; specifically, change the text under
"Output Schema" to state the required top-level keys are findings, summary and
review_completed (and that the legacy scores key is deprecated and must not be
emitted), and ensure the per-finding allowlist note remains unchanged so that
each finding uses only the five permitted fields; reference the "Output Schema"
section and the earlier "review_completed: true" contract when editing the
wording.

In `@plugins/dso/agents/ci/code-reviewer-security-red-team.md`:
- Around line 563-566: The Output Schema delta currently states only findings
and summary are required but the base contract also requires review_completed:
true; update the "Output Schema" section so the required top-level keys include
review_completed (e.g., "Your output MUST conform... (3 required top-level keys:
findings, summary, review_completed)"), and mention that each output must set
review_completed: true alongside findings and summary to align the delta with
the base contract and prevent a contract mismatch.

In `@plugins/dso/agents/ci/code-reviewer-standard.md`:
- Around line 469-500: The CI JSON example and final-checklist in
plugins/dso/agents/ci/code-reviewer-standard.md still show only "summary" and
"findings" and omit the required "review_completed" field; update the canonical
JSON example in Step 3 and the Step 4 required-fields checklist to include
"review_completed" (boolean) and show it in the single-object example, adjust
any prose that lists required fields (e.g., the Step 4 checks and the
"Field-name discipline" paragraph) to mention "review_completed", and add one
brief example/value for "review_completed" in the example JSON so the validator
examples match actual validator expectations.

In `@plugins/dso/agents/ci/code-reviewer-test-quality.md`:
- Around line 654-657: The "Output Schema" section still says the
reviewer-findings.json requires 2 top-level keys; update the prose under the "##
Output Schema" heading to reflect the new contract requiring 3 required
top-level keys (findings, summary, review_completed), and explicitly note that
review_completed is required and must follow the updated contract so generated
outputs remain schema-valid; ensure any examples or wording that assert "2
required top-level keys" are changed to say "3 required top-level keys" and add
a brief sentence about the expected type/format of review_completed.

In `@plugins/dso/agents/code-reviewer-deep-arch.md`:
- Around line 755-756: Replace the stale phrase "exactly 2 top-level keys
(findings, summary)" with the correct 3-key requirement by stating something
like "exactly 3 top-level keys (findings, summary, review_completed)"; update
the same wording elsewhere in the file where "exactly 2 top-level keys" appears
(the later occurrence around the other paragraph) so both references match the
validator/contract and explicitly list the three keys.

In `@plugins/dso/agents/code-reviewer-light.md`:
- Around line 714-721: Update the light-tier final-output reminder so the
required top-level JSON keys include "review_completed" (always true) in
addition to "findings" and "summary"; change the example output to { "findings":
[...], "summary": "...", "review_completed": true } and update the descriptive
sentence to say "Required top-level keys are `findings`, `summary`, and
`review_completed` (always `true`)." Keep the note about the deprecated `scores`
key tolerated with a warning and ensure no other top-level keys (e.g.,
"dimensions", "ratings") are permitted.

In `@plugins/dso/agents/code-reviewer-security-blue-team.md`:
- Around line 607-618: Update the stale tier-local schema block so it matches
the current enforced contract: replace the old statement about "2 required
top-level keys (`findings`, `summary`)" and the reduced allowlist with the
active schema that includes `review_completed` semantics and the full required
payload shape; ensure the allowlisted finding fields and required `cited_lines`
rule remain consistent with the base validator and reference the same keys used
elsewhere in this file (e.g., `findings`, `summary`, `review_completed`,
`cited_lines`) so blue-team agents emit the correct JSON shape.

In `@plugins/dso/agents/code-reviewer-security-red-team.md`:
- Around line 601-602: The documentation block that enforces the "2 required
top-level keys: findings, summary" in
plugins/dso/agents/code-reviewer-security-red-team.md must be updated to match
the current reviewer-findings.json contract — modify the text that describes the
required top-level keys so it includes the additional required key
review_completed and any updated allowed fields/constraints (retain the
allowlist restriction for finding fields: severity, category, description, file,
cited_lines and the prefix requirement for descriptions), remove or revise the
outdated sentence about the "2-key contract", and ensure the block references
the exact schema keys findings, summary, and review_completed to avoid producing
invalid reviewer payloads.

In `@plugins/dso/agents/code-reviewer-standard.md`:
- Around line 924-925: Update the text that currently reads "top-level scoring
dimension — the JSON schema enforces 2 required top-level keys (`findings`,
`summary`); the legacy `scores` key is deprecated." (and the similar reference
later) so it reflects the new contract: list the required top-level keys
`findings`, `summary`, and `review_completed`, note that `scores` is deprecated,
and remove any assertation that only two keys are required; keep the same
wording style and adjust both occurrences mentioned in the comment to avoid the
conflicting guidance.

In `@tests/hooks/test-validate-review-output.sh`:
- Around line 1479-1504: The tests are masking escalate_review validation errors
because both fixtures use an empty "findings" array so
attestation/`review_completed` checks short-circuit; update the fixtures created
by write_fixture for _ESC_STRING_FILE and _ESC_MISSING_IDX_FILE to include at
least one minimal finding object (e.g., include a finding with a finding_index
like 0 and an explicit review_completed value) so the code path reaches
escalate_review validation (affecting the asserts that read _ESC_STRING_OUTPUT
and the _snapshot_fail check) and the specific type/missing-field errors
surface.
- Around line 1362-1368: The test fixture review_tier_invalid.json currently has
empty "findings" and omits "review_completed", so the negative test (run_script
code-review-dispatch -> _REVIEW_TIER_INVALID_EXIT) can fail due to attestation
gate instead of the invalid review_tier enum; update the fixture used by
test_review_tier_invalid_value_fails to include a non-empty "findings" array and
set "review_completed": true (or otherwise ensure attestation passes) so the
script reaches enum validation and the invalid "review_tier": "ultra_deep" is
the observable failure being asserted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 514070bb-c19d-4edc-88af-b27c4ad58b38

📥 Commits

Reviewing files that changed from the base of the PR and between b4c162a and ba43430.

📒 Files selected for processing (29)
  • plugins/dso/agents/ci/code-reviewer-deep-arch.md
  • plugins/dso/agents/ci/code-reviewer-deep-correctness.md
  • plugins/dso/agents/ci/code-reviewer-deep-hygiene.md
  • plugins/dso/agents/ci/code-reviewer-deep-verification.md
  • plugins/dso/agents/ci/code-reviewer-light.md
  • plugins/dso/agents/ci/code-reviewer-performance.md
  • plugins/dso/agents/ci/code-reviewer-security-blue-team.md
  • plugins/dso/agents/ci/code-reviewer-security-red-team.md
  • plugins/dso/agents/ci/code-reviewer-standard.md
  • plugins/dso/agents/ci/code-reviewer-test-quality.md
  • plugins/dso/agents/code-reviewer-deep-arch.md
  • plugins/dso/agents/code-reviewer-deep-correctness.md
  • plugins/dso/agents/code-reviewer-deep-hygiene.md
  • plugins/dso/agents/code-reviewer-deep-verification.md
  • plugins/dso/agents/code-reviewer-light.md
  • plugins/dso/agents/code-reviewer-performance.md
  • plugins/dso/agents/code-reviewer-security-blue-team.md
  • plugins/dso/agents/code-reviewer-security-red-team.md
  • plugins/dso/agents/code-reviewer-standard.md
  • plugins/dso/agents/code-reviewer-test-quality.md
  • plugins/dso/docs/contracts/review-findings-schema.md
  • plugins/dso/docs/workflows/prompts/reviewer-base.md
  • plugins/dso/scripts/dso_ci_review/dispatch.py
  • plugins/dso/scripts/dso_ci_review/findings.py
  • plugins/dso/scripts/dso_ci_review/runner.py
  • plugins/dso/scripts/validate-review-output.sh
  • tests/docs/test-reviewer-base-file-constraint.sh
  • tests/hooks/test-validate-review-output.sh
  • tests/scripts/test-write-reviewer-findings.sh

Comment on lines +311 to 323
keys_match = re.search(r'three required top-level keys|EXACTLY three top-level keys|required.*three.*keys', content, re.IGNORECASE)
attest_match = re.search(r'review_completed', content)
if keys_match and attest_match:
print("three_key_found")
PYEOF
)"
if [[ "$schema_section" == *"two_key_found"* ]]; then
assert_eq "JSON schema specifies 2-key schema (EXACTLY two top-level keys: findings + summary)" "present" "present"
if [[ "$schema_section" == *"three_key_found"* ]]; then
assert_eq "JSON schema specifies 3-key schema (findings + summary + review_completed)" "present" "present"
else
assert_eq "JSON schema specifies 2-key schema (findings + summary, no scores)" \
"2-key schema specification (findings + summary)" \
"2-key schema language not found — still using 3-key schema?"
assert_eq "JSON schema specifies 3-key schema (findings + summary + review_completed)" \
"3-key schema specification (findings + summary + review_completed)" \
"3-key schema language not found — review_completed attestation missing (bug 1b76)?"
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten the 3-key contract assertion to avoid false positives.

Current logic accepts any file containing “three required ... keys” plus any separate mention of review_completed. It does not prove the required-key clause actually names findings, summary, and review_completed together.

Suggested fix
-keys_match = re.search(r'three required top-level keys|EXACTLY three top-level keys|required.*three.*keys', content, re.IGNORECASE)
-attest_match = re.search(r'review_completed', content)
-if keys_match and attest_match:
+lines = content.splitlines()
+keys_line = next(
+    (ln for ln in lines if re.search(r'(three|3).*required.*top-level keys', ln, re.IGNORECASE)),
+    ""
+)
+if all(k in keys_line for k in ('findings', 'summary', 'review_completed')):
     print("three_key_found")

As per coding guidelines, “Tests must exercise observable behavior of new code paths.” Based on learnings, the upstream contract in plugins/dso/docs/workflows/prompts/reviewer-base.md defines these exact three required top-level keys.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/docs/test-reviewer-base-file-constraint.sh` around lines 311 - 323, The
current check uses keys_match and attest_match separately and can misfire;
change the logic to assert that the required-key clause explicitly names
findings, summary and review_completed together (e.g., a single regex/search
that matches "required.*(findings).*?(summary).*?(review_completed)" or a
pattern that finds those three tokens in the same clause/line), replace the
two-step keys_match + attest_match approach with a single combined assertion,
and update the condition that sets "three_key_found" to only be true when that
combined pattern matches (references: keys_match, attest_match, and the
schema_section handling that prints/inspects "three_key_found").

Source: Coding guidelines

Test and others added 2 commits June 6, 2026 10:12
… arch_result review_completed injection

Neutral opus pre-merge review of #693 surfaced an important latent fail-CLOSED
introduced by the 1b76 empty-findings rule: the deep-tier path (runner.py
`merged = arch_result`) uses the raw arch-agent output, which carries
review_completed only if the LLM emitted it. A clean deep review omitting the key
failed the new empty-findings rule and routed into dispatch_schema_correction,
which is structurally unable to add the key (reviewer-schema-correction.md forbids
adding keys) — re-failing into a synthetic schema_error that blocked an otherwise
clean PR.

Fix: inject review_completed=True onto a non-synthetic arch_result with empty
findings before validation (copy-not-mutate), mirroring the merge_findings guard.
Synthetic arch results are unaffected (short-circuited to the merge_findings dict,
which already carries the key); non-empty results are their own positive signal.

Adds tests/skills/dso_ci_review/test_empty_findings_review_completed_integration.py
(4 tests): merge_findings→real-validator end-to-end, the empty-without-attestation
control (rule still enforced), the deep-tier arch-replacement RED/GREEN, and the
synthetic-edge guard. dso_ci_review suite 902 passed; validator suite 103 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…view attestation (1b76)

Re-review found the prior per-path injection incomplete: the runner produces the
validated `merged` dict from several branches (single-pass merge_findings, deep
arch-replacement, Strategy-F cluster-aggregation, rechunk). Only two were patched;
the cluster-aggregation and rechunk clean-review dicts still reached the validator
without review_completed (and, for cluster, with non-allowed keys + no summary) —
re-introducing the 1b76 fail-CLOSED via those paths.

Replace per-path injection with a single authoritative normalization
(_normalize_empty_clean_review) applied immediately before the sole main-pipeline
_validate_findings_schema(merged) call. For a NON-synthetic empty-findings merged
it stamps review_completed:True, synthesizes a summary, and drops non-allowed
top-level keys — normalizing the empty clean-review dict to the validator contract
so no clean-review branch (current or future) fails closed. The redundant arch
per-path injection is removed; merge_findings()'s own injection is kept (reusable
public contract). The other four _validate_findings_schema calls live inside
dispatch_schema_correction (downstream of the choke point) and are unaffected.
Non-empty / synthetic flows are untouched.

Adds cluster-aggregation + rechunk clean-review integration tests (RED before the
choke point, GREEN after). tests/skills/dso_ci_review/ 904 passed; validator 103.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# When validate-review-output.sh changes its schema, update this constant and
# the corresponding test in test_runner_smoke.py.
_VALIDATE_REVIEW_SCHEMA_HASH = "cb48a66fc3292083"
_VALIDATE_REVIEW_SCHEMA_HASH = "35dd40c9e6e83207"
Comment thread plugins/dso/scripts/dso_ci_review/runner.py
Comment thread plugins/dso/scripts/dso_ci_review/runner.py
…t (1b76 re-review minor)

Cycle-3 neutral review minor: runner.py Step 7b comment referenced the pre-1b76
schema hash 214949ee476be6d0 (from PR #124, never updated). Documentation-only;
the real lockstep is _VALIDATE_REVIEW_SCHEMA_HASH + the drift test (both already
35dd40c9e6e83207). Comment updated to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gitar-bot

gitar-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown
CI failed: The CI build failed because the PR is targeting the 'main' branch directly from a 'fix/' branch, violating the required 'staged-*' naming convention enforced by the project's two-tier promotion workflow.

Overview

The check-staged-head job failed on all analyzed runs because the current branch naming convention does not adhere to the mandatory two-tier promotion workflow required for merging into main.

Failures

Branch Naming Constraint Violation (confidence: high)

  • Type: configuration
  • Affected jobs: 79895295264
  • Related to change: yes
  • Root cause: The CI/CD pipeline requires that PRs targeting main originate from a branch prefixed with staged-. The current branch fix/f6b3-restage-advisory fails this validation check.
  • Suggested fix: Ensure your workflow follows the defined process: 1) Merge your fix/ branch into a staged-* branch, and 2) Open a PR from the staged-* branch into main.

Summary

  • Change-related failures: 1 (Branch naming policy violation)
  • Infrastructure/flaky failures: 0
  • Recommended action: Retarget your PR to merge into a staged-* branch instead of main, or rename your current branch to match the staged-* pattern if it is intended for direct integration into main.
Code Review ⚠️ Changes requested 2 resolved / 4 findings

Enforces a required review_completed attestation for empty-findings payloads to close a fail-open, but merge_findings() unconditionally sets this flag, potentially re-introducing the vulnerability.

⚠️ Bug: merge_findings always sets review_completed, can re-open 1b76 fail-open

📄 plugins/dso/scripts/dso_ci_review/findings.py:158-172 📄 plugins/dso/scripts/dso_ci_review/findings.py:190-195 📄 plugins/dso/scripts/dso_ci_review/runner.py:3428-3441

merge_findings() unconditionally sets "review_completed": True on its result (findings.py:190-195), regardless of whether the aggregation was actually successful. The runner validates this merged payload via _validate_findings_schema, so for the runner's own clean-review path the new validator guard (which requires review_completed === true on empty findings) is always satisfied by construction.

This matters because the merge can degenerate to an EMPTY findings list while still being attested complete. Per the bug-7f55 guards in merge_findings (findings.py:158-174), non-dict/degraded reviewer payloads are silently skipped. If every reviewer payload is degraded (e.g. provider exhaustion returns bare strings, or all clusters yield non-dict entries) and no synthetic specialist_error finding is injected, merged_findings is [] and review_completed is True. The all-specialist-errors early-exit at runner.py:3429 only fires when bool(_pre_schema_findings) is truthy AND every entry is type == "specialist_error"; an empty findings list makes that guard short-circuit (bool([]) is False), so the empty-but-attested payload flows straight to validation and PASSES as a clean review.

This is precisely the fail-open class 1b76 is meant to close — an empty payload accepted as a clean review — now reachable via the runner's aggregation path because the attestation is synthesized unconditionally rather than derived from evidence of a real review. Consider gating review_completed on having merged at least one valid dict payload, and/or extending the runner.py:3429 early-exit to also fire when _pre_schema_findings is empty.

Derive review_completed from whether any valid payload was merged, so a degenerate all-degraded aggregation no longer falsely attests completion.
# Only attest completion when at least one valid reviewer payload
# actually contributed (bug 1b76): an unconditional True would re-open
# the empty-findings fail-open for degenerate/degraded aggregations.
_had_valid_payload = any(isinstance(fd, dict) for fd in finding_dicts)
return {
    "findings": merged_findings,
    "scores": merged_scores,
    "summary": "; ".join(summaries) if summaries else "",
    "review_completed": _had_valid_payload,
}
💡 Quality: Redundant synthetic-payload check in empty-findings enforcement

📄 plugins/dso/scripts/validate-review-output.sh:468-479

In validate-review-output.sh, the empty-findings enforcement computes _is_synthetic_payload = any(... for f in findings) and then gates on if not findings and not _is_synthetic_payload. When not findings is true the list is empty, so any(...) over it is always False and _is_synthetic_payload can never be True — the synthetic-payload term is dead code in this branch. A synthetic payload always carries at least one finding entry (with type), making findings non-empty, so it is already excluded by not findings. The extra computation is harmless but misleading: it suggests an empty synthetic payload would be exempted when in fact no such case exists. Consider dropping the _is_synthetic_payload term here (or adding a comment that it is intentionally defensive) to avoid implying behavior that the condition cannot exercise.

✅ 2 resolved
Edge Case: Choke-point ignores aggregation_status="failed" on empty findings

📄 plugins/dso/scripts/dso_ci_review/runner.py:387-401 📄 plugins/dso/scripts/dso_ci_review/runner.py:3565
_normalize_empty_clean_review (runner.py:387-448) scopes itself to "empty-findings AND not synthetic" and then unconditionally stamps review_completed: True, synthesizes a summary, and drops every non-allowed top-level key — including aggregation_status. The cluster aggregator (aggregator.py) emits aggregation_status: "failed" when sub-cluster aggregation fails. If a failed aggregation produces an empty findings list (the synthetic-type guard only inspects per-finding type, which is vacuous for an empty list), this function will convert that failed aggregation into a clean, positively-attested review and silently discard the "failed" status before validation.

This re-introduces the exact fail-open class 1b76 is meant to close: a payload that is byte-equivalent to a non-completed review is stamped as completed and allowed to merge. The fix should not normalize (or should not stamp review_completed) when merged.get("aggregation_status") == "failed", routing such payloads to the failure/correction path instead.

Quality: Clean reviews lose visibility_trailer (coverage/skipped-file info)

📄 plugins/dso/scripts/dso_ci_review/runner.py:430-444
For an empty-findings clean review, _normalize_empty_clean_review only promotes visibility_trailer into summary when no summary is present; otherwise the trailer is dropped along with all other non-allowed keys (runner.py:438-447). The cluster aggregator already supplies a summary for clean reviews (see the test fixture _clean_aggregate_result, which sets both summary and visibility_trailer), so in the common cluster-aggregation path the DSO-Review-Coverage trailer ("Reviewed N file(s); skipped M") is discarded entirely from the written output. Reviewers/automation that rely on the coverage trailer to know which files were skipped will no longer see it on clean reviews. If the trailer is meant to be surfaced, consider appending it to the summary or retaining it as an allowed key rather than dropping it whenever a summary already exists.

🤖 Prompt for agents
Code Review: Enforces a required `review_completed` attestation for empty-findings payloads to close a fail-open, but `merge_findings()` unconditionally sets this flag, potentially re-introducing the vulnerability.

1. ⚠️ Bug: merge_findings always sets review_completed, can re-open 1b76 fail-open
   Files: plugins/dso/scripts/dso_ci_review/findings.py:158-172, plugins/dso/scripts/dso_ci_review/findings.py:190-195, plugins/dso/scripts/dso_ci_review/runner.py:3428-3441

   `merge_findings()` unconditionally sets `"review_completed": True` on its result (findings.py:190-195), regardless of whether the aggregation was actually successful. The runner validates this merged payload via `_validate_findings_schema`, so for the runner's own clean-review path the new validator guard (which requires `review_completed === true` on empty findings) is *always* satisfied by construction.
   
   This matters because the merge can degenerate to an EMPTY findings list while still being attested complete. Per the bug-7f55 guards in merge_findings (findings.py:158-174), non-dict/degraded reviewer payloads are silently skipped. If every reviewer payload is degraded (e.g. provider exhaustion returns bare strings, or all clusters yield non-dict entries) and no synthetic `specialist_error` finding is injected, `merged_findings` is `[]` and `review_completed` is `True`. The all-specialist-errors early-exit at runner.py:3429 only fires when `bool(_pre_schema_findings)` is truthy AND every entry is `type == "specialist_error"`; an *empty* findings list makes that guard short-circuit (`bool([])` is False), so the empty-but-attested payload flows straight to validation and PASSES as a clean review.
   
   This is precisely the fail-open class 1b76 is meant to close — an empty payload accepted as a clean review — now reachable via the runner's aggregation path because the attestation is synthesized unconditionally rather than derived from evidence of a real review. Consider gating `review_completed` on having merged at least one valid dict payload, and/or extending the runner.py:3429 early-exit to also fire when `_pre_schema_findings` is empty.

   Fix (Derive review_completed from whether any valid payload was merged, so a degenerate all-degraded aggregation no longer falsely attests completion.):
   # Only attest completion when at least one valid reviewer payload
   # actually contributed (bug 1b76): an unconditional True would re-open
   # the empty-findings fail-open for degenerate/degraded aggregations.
   _had_valid_payload = any(isinstance(fd, dict) for fd in finding_dicts)
   return {
       "findings": merged_findings,
       "scores": merged_scores,
       "summary": "; ".join(summaries) if summaries else "",
       "review_completed": _had_valid_payload,
   }

2. 💡 Quality: Redundant synthetic-payload check in empty-findings enforcement
   Files: plugins/dso/scripts/validate-review-output.sh:468-479

   In validate-review-output.sh, the empty-findings enforcement computes `_is_synthetic_payload = any(... for f in findings)` and then gates on `if not findings and not _is_synthetic_payload`. When `not findings` is true the list is empty, so `any(...)` over it is always False and `_is_synthetic_payload` can never be True — the synthetic-payload term is dead code in this branch. A synthetic payload always carries at least one finding entry (with `type`), making `findings` non-empty, so it is already excluded by `not findings`. The extra computation is harmless but misleading: it suggests an empty synthetic payload would be exempted when in fact no such case exists. Consider dropping the `_is_synthetic_payload` term here (or adding a comment that it is intentionally defensive) to avoid implying behavior that the condition cannot exercise.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@JoeOakhartNava JoeOakhartNava merged commit 5957d3b into main Jun 6, 2026
21 of 22 checks passed
joeoakhart pushed a commit that referenced this pull request Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants