Skip to content

Commit 9e973b2

Browse files
fix(6d83-b949): deterministic score-severity rules for review system (merge worktree-20260327-075020)
2 parents 2fe05bf + d09d222 commit 9e973b2

11 files changed

+214
-184
lines changed

plugins/dso/agents/code-reviewer-deep-arch.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-arch
33
model: opus
44
description: Deep-tier architectural reviewer (Opus): synthesizes specialist findings, assesses systemic risk, produces unified verdict across all dimensions.
55
---
6-
<!-- content-hash: daba5429d50de93a29d6e40c1e28d6053c85c6bc07be3422c35851ac8b976e5b -->
6+
<!-- content-hash: 5726710d4d9c90e5b79810a1b16bbdb047fc53bc2aaa65780598cffff23ed9a6 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/agents/code-reviewer-deep-correctness.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-correctness
33
model: sonnet
44
description: Deep-tier correctness specialist (Sonnet A): focused exclusively on correctness — edge cases, error handling, security, efficiency.
55
---
6-
<!-- content-hash: 0aa3d7727f9d6fad141c7bd2a57052e3764816b7450112f75b615b146aa8466e -->
6+
<!-- content-hash: d42a3d7b417a4e0699de9eda503d6b9ec64de71a5b320dedff5901ae671580e1 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/agents/code-reviewer-deep-hygiene.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-hygiene
33
model: sonnet
44
description: Deep-tier hygiene/design specialist (Sonnet C): focused on hygiene, design, and maintainability.
55
---
6-
<!-- content-hash: 098a8a63b9482eecf85a8557b644ae504809214b21e072e6565f3d773b8d9d13 -->
6+
<!-- content-hash: ff89bbdf4a3ffd69f24e1e87a394c3a5bea7444d9fb17202094107ff6cfdfe8c -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/agents/code-reviewer-deep-verification.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-verification
33
model: sonnet
44
description: Deep-tier verification specialist (Sonnet B): focused exclusively on verification — test presence, quality, edge case coverage, mock correctness.
55
---
6-
<!-- content-hash: 7b696db753cceda549c4638f836d63b1cd8fa69905eee73e290a1ff448b30c99 -->
6+
<!-- content-hash: d814c1f3e94c5d3b14ebdcb18b0f4251fc0da7811ac48f42d26a09ea7e8a3e46 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/agents/code-reviewer-light.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-light
33
model: haiku
44
description: Light-tier code reviewer: single-pass, highest-signal checklist for fast feedback on low-to-medium-risk changes.
55
---
6-
<!-- content-hash: cf204431b33bdfa9b5c9651b618846407d7fa4e52c5f08406e8cad27bf330a92 -->
6+
<!-- content-hash: 5673f0c1e9c4b3a2b26ead04e6d15316679f75c7e03de602dc4eabcc01a40463 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/agents/code-reviewer-standard.md

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-standard
33
model: sonnet
44
description: Standard-tier code reviewer: comprehensive review across all five scoring dimensions for moderate-to-high-risk changes.
55
---
6-
<!-- content-hash: 98a5b33d77bf6bc7302dc87c4c04e96263c5c4092a8a3ebc7f9a9bd2ace255b1 -->
6+
<!-- content-hash: 00da22af428ebe07e30ae88b6f3af311c37719a343780d3ab8a5fe4c15b07972 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -126,24 +126,18 @@ will be rejected by the validator and force a re-dispatch.
126126

127127
## Scoring Rules
128128

129-
All scores use the 1–5 scale — maximum is 5, not 10:
129+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
130+
validates consistency and rejects mismatches.
130131

131-
- Critical finding → score 1-2 (always fails)
132-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
133-
- Minor only or no findings → score 4-5
134-
- Dimension not relevant → "N/A"
135-
- Multiple severities in same dimension → worst wins
136-
137-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
138-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
139-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
140-
a minor-only dimension below 4.
132+
| Worst finding in dimension | Score |
133+
|---------------------------|-------|
134+
| No findings | 5 |
135+
| Minor only | 4 |
136+
| Important (not critical) | 3 |
137+
| Critical | 1–2 |
141138

142-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
143-
or 5. A low score without a supporting finding is a scoring error — every score below 4
144-
must be backed by at least one `important` or `critical` finding in that dimension.
145-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
146-
below 4.
139+
- Multiple severities in same dimension → worst wins
140+
- Dimension not relevant → "N/A"
147141

148142
---
149143

plugins/dso/docs/workflows/prompts/reviewer-base.md

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -118,24 +118,18 @@ will be rejected by the validator and force a re-dispatch.
118118

119119
## Scoring Rules
120120

121-
All scores use the 1–5 scale — maximum is 5, not 10:
121+
Scores are integers 1–5 (not 0–10), driven by findings. `write-reviewer-findings.sh`
122+
validates consistency and rejects mismatches.
122123

123-
- Critical finding → score 1-2 (always fails)
124-
- Important finding → score 3-4 (judgment: 3 if significant, 4 if minor impact)
125-
- Minor only or no findings → score 4-5
126-
- Dimension not relevant → "N/A"
127-
- Multiple severities in same dimension → worst wins
128-
129-
**Minor-only enforcement**: If ALL findings in a dimension are severity=`minor`, that
130-
dimension's score MUST be 4 or 5. Score 3 is reserved for `important` findings.
131-
`write-reviewer-findings.sh` will reject your JSON with a validation error if you score
132-
a minor-only dimension below 4.
124+
| Worst finding in dimension | Score |
125+
|---------------------------|-------|
126+
| No findings | 5 |
127+
| Minor only | 4 |
128+
| Important (not critical) | 3 |
129+
| Critical | 1–2 |
133130

134-
**No-findings enforcement**: If a dimension has NO findings at all, its score MUST be 4
135-
or 5. A low score without a supporting finding is a scoring error — every score below 4
136-
must be backed by at least one `important` or `critical` finding in that dimension.
137-
`write-reviewer-findings.sh` will reject your JSON if any no-findings dimension scores
138-
below 4.
131+
- Multiple severities in same dimension → worst wins
132+
- Dimension not relevant → "N/A"
139133

140134
---
141135

plugins/dso/scripts/validate-review-output.sh

Lines changed: 11 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -270,9 +270,8 @@ else:
270270
if isinstance(score, (int, float)) and score > 2:
271271
errors.append(f"{prefix}: severity='critical' but scores[{cat}]={score} (critical requires score 1-2)")
272272
273-
# Validate score-severity consistency: if all findings in a dimension are
274-
# severity=minor, that dimension's score must be 4 or 5 (scoring rules state
275-
# 'Minor only or no findings -> score 4-5'; score 3 is reserved for important).
273+
# Score-severity consistency: scores are driven by the worst finding severity.
274+
# No findings → 5, minor only → 4, important → 3, critical → 1-2.
276275
if isinstance(findings, list) and isinstance(scores, dict):
277276
from collections import defaultdict
278277
dim_severities: dict = defaultdict(set)
@@ -281,22 +280,16 @@ if isinstance(findings, list) and isinstance(scores, dict):
281280
sev = finding.get("severity")
282281
if cat and sev:
283282
dim_severities[cat].add(sev)
284-
for dim, sevs in dim_severities.items():
285-
score = scores.get(dim)
286-
if isinstance(score, (int, float)) and score < 4 and sevs == {"minor"}:
287-
errors.append(
288-
f"score '{dim}'={score} violates scoring rules: all findings in "
289-
f"this dimension are severity='minor', which requires score 4-5 "
290-
f"(score 3 is reserved for important findings)"
291-
)
292-
# No-findings low-score check (6d83-b949): if a dimension has no findings
293-
# at all, its score must be 4 or 5 per scoring rules ("no findings → score 4-5").
294283
for dim, score in scores.items():
295-
if dim not in dim_severities and isinstance(score, (int, float)) and score < 4:
296-
errors.append(
297-
f"score '{dim}'={score} violates scoring rules: this dimension has "
298-
f"no findings, which requires score 4-5"
299-
)
284+
if not isinstance(score, (int, float)):
285+
continue
286+
sevs = dim_severities.get(dim, set())
287+
if not sevs and score != 5:
288+
errors.append(f"score '{dim}'={score}: no findings requires score 5")
289+
elif sevs == {"minor"} and score != 4:
290+
errors.append(f"score '{dim}'={score}: minor-only findings requires score 4")
291+
elif "important" in sevs and "critical" not in sevs and score != 3:
292+
errors.append(f"score '{dim}'={score}: important (no critical) findings requires score 3")
300293
301294
# Validate summary
302295
summary = data.get("summary")

tests/hooks/test-review-workflow-classifier-dispatch.sh

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -705,11 +705,11 @@ RECORD_REVIEW_HOOK="$REPO_ROOT/plugins/dso/hooks/record-review.sh"
705705

706706
FINDINGS_A_JSON='{
707707
"scores": {
708-
"hygiene": 4,
708+
"hygiene": 5,
709709
"design": "N/A",
710-
"maintainability": 4,
710+
"maintainability": 5,
711711
"correctness": 3,
712-
"verification": 4
712+
"verification": 5
713713
},
714714
"findings": [
715715
{
@@ -724,10 +724,10 @@ FINDINGS_A_JSON='{
724724

725725
FINDINGS_B_JSON='{
726726
"scores": {
727-
"hygiene": 4,
727+
"hygiene": 5,
728728
"design": "N/A",
729729
"maintainability": 5,
730-
"correctness": 4,
730+
"correctness": 5,
731731
"verification": 3
732732
},
733733
"findings": [
@@ -744,10 +744,10 @@ FINDINGS_B_JSON='{
744744
FINDINGS_C_JSON='{
745745
"scores": {
746746
"hygiene": 3,
747-
"design": 4,
748-
"maintainability": 4,
749-
"correctness": 4,
750-
"verification": 4
747+
"design": 5,
748+
"maintainability": 5,
749+
"correctness": 5,
750+
"verification": 5
751751
},
752752
"findings": [
753753
{
@@ -852,8 +852,8 @@ assert_eq "inline_block_header_matches_arch_agent_input_format: headers match ==
852852
OPUS_FINDINGS_JSON='{
853853
"scores": {
854854
"hygiene": 3,
855-
"design": 4,
856-
"maintainability": 4,
855+
"design": 5,
856+
"maintainability": 5,
857857
"correctness": 3,
858858
"verification": 3
859859
},

0 commit comments

Comments
 (0)