fix(6d83-b949): deterministic score-severity rules for review system (merge worktree-20260327-075020)

JoeOakhartNava · JoeOakhartNava · commit 9e973b206bfe · 2026-03-27T09:59:33.000-07:00
diff --git a/plugins/dso/agents/code-reviewer-deep-arch.md b/plugins/dso/agents/code-reviewer-deep-arch.md
@@ -3,7 +3,7 @@ name: code-reviewer-deep-arch
 model: opus
 description: Deep-tier architectural reviewer (Opus): synthesizes specialist findings, assesses systemic risk, produces unified verdict across all dimensions.
 ---
-<!-- content-hash: daba5429d50de93a29d6e40c1e28d6053c85c6bc07be3422c35851ac8b976e5b -->
+<!-- content-hash: 5726710d4d9c90e5b79810a1b16bbdb047fc53bc2aaa65780598cffff23ed9a6 -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/agents/code-reviewer-deep-correctness.md b/plugins/dso/agents/code-reviewer-deep-correctness.md
@@ -3,7 +3,7 @@ name: code-reviewer-deep-correctness
 model: sonnet
 description: Deep-tier correctness specialist (Sonnet A): focused exclusively on correctness — edge cases, error handling, security, efficiency.
 ---
-<!-- content-hash: 0aa3d7727f9d6fad141c7bd2a57052e3764816b7450112f75b615b146aa8466e -->
+<!-- content-hash: d42a3d7b417a4e0699de9eda503d6b9ec64de71a5b320dedff5901ae671580e1 -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/agents/code-reviewer-deep-hygiene.md b/plugins/dso/agents/code-reviewer-deep-hygiene.md
@@ -3,7 +3,7 @@ name: code-reviewer-deep-hygiene
 model: sonnet
 description: Deep-tier hygiene/design specialist (Sonnet C): focused on hygiene, design, and maintainability.
 ---
-<!-- content-hash: 098a8a63b9482eecf85a8557b644ae504809214b21e072e6565f3d773b8d9d13 -->
+<!-- content-hash: ff89bbdf4a3ffd69f24e1e87a394c3a5bea7444d9fb17202094107ff6cfdfe8c -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/agents/code-reviewer-deep-verification.md b/plugins/dso/agents/code-reviewer-deep-verification.md
@@ -3,7 +3,7 @@ name: code-reviewer-deep-verification
 model: sonnet
 description: Deep-tier verification specialist (Sonnet B): focused exclusively on verification — test presence, quality, edge case coverage, mock correctness.
 ---
-<!-- content-hash: 7b696db753cceda549c4638f836d63b1cd8fa69905eee73e290a1ff448b30c99 -->
+<!-- content-hash: d814c1f3e94c5d3b14ebdcb18b0f4251fc0da7811ac48f42d26a09ea7e8a3e46 -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/agents/code-reviewer-light.md b/plugins/dso/agents/code-reviewer-light.md
@@ -3,7 +3,7 @@ name: code-reviewer-light
 model: haiku
 description: Light-tier code reviewer: single-pass, highest-signal checklist for fast feedback on low-to-medium-risk changes.
 ---
-<!-- content-hash: cf204431b33bdfa9b5c9651b618846407d7fa4e52c5f08406e8cad27bf330a92 -->
+<!-- content-hash: 5673f0c1e9c4b3a2b26ead04e6d15316679f75c7e03de602dc4eabcc01a40463 -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/agents/code-reviewer-standard.md b/plugins/dso/agents/code-reviewer-standard.md
@@ -3,7 +3,7 @@ name: code-reviewer-standard
 model: sonnet
 description: Standard-tier code reviewer: comprehensive review across all five scoring dimensions for moderate-to-high-risk changes.
 ---
-<!-- content-hash: 98a5b33d77bf6bc7302dc87c4c04e96263c5c4092a8a3ebc7f9a9bd2ace255b1 -->
+<!-- content-hash: 00da22af428ebe07e30ae88b6f3af311c37719a343780d3ab8a5fe4c15b07972 -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/docs/workflows/prompts/reviewer-base.md b/plugins/dso/docs/workflows/prompts/reviewer-base.md
@@ -118,24 +118,18 @@ will be rejected by the validator and force a re-dispatch.
 
 ## Scoring Rules
 
-All scores use the 1–5 scale — maximum is 5, not 10:
+Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
+validates consistency and rejects mismatches.
 
-- Critical finding → score 1-2 (always fails)
-- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
-- Minor only or no findings → score 4-5
-- Dimension not relevant → "N/A"
-- Multiple severities in same dimension → worst wins
-
-**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
-dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
-`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
-a minor-only dimension below 4.
+| Worst finding in dimension | Score |
+|---------------------------|-------|
+| No findings | 5 |
+| Minor only | 4 |
+| Important (not critical) | 3 |
+| Critical | 1–2 |
 
-**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
-or 5. A low score without a supporting finding is a scoring error — every score below 4
-must be backed by at least one `important` or `critical` finding in that dimension.
-`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
-below 4.
+- Multiple severities in same dimension → worst wins
+- Dimension not relevant → "N/A"
 
 ---
 
diff --git a/plugins/dso/scripts/validate-review-output.sh b/plugins/dso/scripts/validate-review-output.sh
@@ -270,9 +270,8 @@ else:
             if isinstance(score, (int, float)) and score > 2:
                 errors.append(f"{prefix}: severity='critical' but scores[{cat}]={score} (critical requires score 1-2)")
 
-# Validate score-severity consistency: if all findings in a dimension are
-# severity=minor, that dimension's score must be 4 or 5 (scoring rules state
-# 'Minor only or no findings -> score 4-5'; score 3 is reserved for important).
+# Score-severity consistency: scores are driven by the worst finding severity.
+# No findings → 5, minor only → 4, important → 3, critical → 1-2.
 if isinstance(findings, list) and isinstance(scores, dict):
     from collections import defaultdict
     dim_severities: dict = defaultdict(set)
@@ -281,22 +280,16 @@ if isinstance(findings, list) and isinstance(scores, dict):
         sev = finding.get("severity")
         if cat and sev:
             dim_severities[cat].add(sev)
-    for dim, sevs in dim_severities.items():
-        score = scores.get(dim)
-        if isinstance(score, (int, float)) and score < 4 and sevs == {"minor"}:
-            errors.append(
-                f"score '{dim}'={score} violates scoring rules: all findings in "
-                f"this dimension are severity='minor', which requires score 4-5 "
-                f"(score 3 is reserved for important findings)"
-            )
-    # No-findings low-score check (6d83-b949): if a dimension has no findings
-    # at all, its score must be 4 or 5 per scoring rules ("no findings → score 4-5").
     for dim, score in scores.items():
-        if dim not in dim_severities and isinstance(score, (int, float)) and score < 4:
-            errors.append(
-                f"score '{dim}'={score} violates scoring rules: this dimension has "
-                f"no findings, which requires score 4-5"
-            )
+        if not isinstance(score, (int, float)):
+            continue
+        sevs = dim_severities.get(dim, set())
+        if not sevs and score != 5:
+            errors.append(f"score '{dim}'={score}: no findings requires score 5")
+        elif sevs == {"minor"} and score != 4:
+            errors.append(f"score '{dim}'={score}: minor-only findings requires score 4")
+        elif "important" in sevs and "critical" not in sevs and score != 3:
+            errors.append(f"score '{dim}'={score}: important (no critical) findings requires score 3")
 
 # Validate summary
 summary = data.get("summary")
diff --git a/tests/hooks/test-review-workflow-classifier-dispatch.sh b/tests/hooks/test-review-workflow-classifier-dispatch.sh
@@ -705,11 +705,11 @@ RECORD_REVIEW_HOOK="$REPO_ROOT/plugins/dso/hooks/record-review.sh"
 
 FINDINGS_A_JSON='{
   "scores": {
-    "hygiene": 4,
+    "hygiene": 5,
     "design": "N/A",
-    "maintainability": 4,
+    "maintainability": 5,
     "correctness": 3,
-    "verification": 4
+    "verification": 5
   },
   "findings": [
     {
@@ -724,10 +724,10 @@ FINDINGS_A_JSON='{
 
 FINDINGS_B_JSON='{
   "scores": {
-    "hygiene": 4,
+    "hygiene": 5,
     "design": "N/A",
     "maintainability": 5,
-    "correctness": 4,
+    "correctness": 5,
     "verification": 3
   },
   "findings": [
@@ -744,10 +744,10 @@ FINDINGS_B_JSON='{
 FINDINGS_C_JSON='{
   "scores": {
     "hygiene": 3,
-    "design": 4,
-    "maintainability": 4,
-    "correctness": 4,
-    "verification": 4
+    "design": 5,
+    "maintainability": 5,
+    "correctness": 5,
+    "verification": 5
   },
   "findings": [
     {
@@ -852,8 +852,8 @@ assert_eq "inline_block_header_matches_arch_agent_input_format: headers match ==
 OPUS_FINDINGS_JSON='{
   "scores": {
     "hygiene": 3,
-    "design": 4,
-    "maintainability": 4,
+    "design": 5,
+    "maintainability": 5,
     "correctness": 3,
     "verification": 3
   },
diff --git a/tests/hooks/test-validate-review-output.sh b/tests/hooks/test-validate-review-output.sh
diff --git a/tests/scripts/test-write-reviewer-findings.sh b/tests/scripts/test-write-reviewer-findings.sh