Skip to content

Commit 044f591

Browse files
hyhmrrightclaude
andcommitted
chore: harden consistency — risk-code binding, deterministic health score, validation gates
Content: - decay-risks: bind R1–R6 codes into the canonical risk headers (were bare "Risk N") so a code is verifiable without counting headers - common.md: add a Code column to the decay-risk navigation index - health-guide: make the composite score reproducible — floor each dimension before weighting, never skip a no-finding dimension, round half-up Tooling: - bump-version: propagate the version badge to README.zh-CN.md too - validate-repo: assert the zh-CN README badge matches package.json - run-evals: add duplicate-id, files-array, mode↔risk-compatibility, and reverse-coverage (every code has a positive scenario) structural checks All gates green: npm run validate, npm test (55), npm run evals (49). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 056c3f3 commit 044f591

6 files changed

Lines changed: 86 additions & 24 deletions

File tree

scripts/bump-version.mjs

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,11 @@ for (const { rel, update } of manifests) {
2828
console.log(` ✓ ${rel}`);
2929
}
3030

31-
let readme = readFileSync(path.join(root, "README.md"), "utf8");
32-
readme = readme.replace(/version-[\d.]+?-blue\.svg/g, `version-${version}-blue.svg`);
33-
writeFileSync(path.join(root, "README.md"), readme, "utf8");
34-
console.log(" ✓ README.md badge");
31+
for (const readmeRel of ["README.md", "README.zh-CN.md"]) {
32+
let readme = readFileSync(path.join(root, readmeRel), "utf8");
33+
readme = readme.replace(/version-[\d.]+?-blue\.svg/g, `version-${version}-blue.svg`);
34+
writeFileSync(path.join(root, readmeRel), readme, "utf8");
35+
console.log(` ✓ ${readmeRel} badge`);
36+
}
3537

3638
console.log(`\nAll manifests updated to ${version}. Run npm run validate to confirm.`);

scripts/run-evals.mjs

Lines changed: 54 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,14 @@ for (let i = 0; i < evals.length; i++) {
4848
}
4949
}
5050

51+
// Explicit duplicate-id guard (the sequential check only catches dups that also
52+
// break the running count; a deliberate re-use of the same id would not).
53+
const idCounts = new Map();
54+
for (const ev of evals) idCounts.set(ev.id, (idCounts.get(ev.id) ?? 0) + 1);
55+
for (const [id, count] of idCounts) {
56+
if (count > 1) errors.push(`Duplicate eval id ${JSON.stringify(id)} appears ${count} times`);
57+
}
58+
5159
// ── Per-eval field and content checks ─────────────────────────────────────
5260

5361
for (const ev of evals) {
@@ -71,13 +79,31 @@ for (const ev of evals) {
7179
errors.push(`${label}: 'mode' must be one of ${VALID_MODES.join(", ")} (got '${ev.mode}')`);
7280
}
7381

82+
if ("files" in ev && !Array.isArray(ev.files)) {
83+
errors.push(`${label}: 'files' must be an array when present (got ${typeof ev.files})`);
84+
}
85+
7486
// expected_output should reference at least one risk code so reviewers know
7587
// which risk the scenario is testing
7688
if (typeof ev.expected_output === "string") {
7789
const referencedCodes = RISK_CODES.filter((code) => ev.expected_output.includes(code));
7890
if (referencedCodes.length === 0) {
7991
warnings.push(`${label}: expected_output does not reference any risk code (${RISK_CODES.join(", ")})`);
8092
}
93+
94+
// mode ↔ risk-code compatibility: assemble-prompt.mjs only loads the risk
95+
// definitions for that mode (test→T-codes, review/audit/debt→R-codes,
96+
// health/sweep→both). A code outside the loaded set is a dead reference —
97+
// the model is never given its definition, so the scenario cannot pass live.
98+
// RISK_CODES is R/T-prefixed by construction, so c[0] fully partitions it.
99+
const refsR = referencedCodes.filter((c) => c[0] === "R");
100+
const refsT = referencedCodes.filter((c) => c[0] === "T");
101+
if (ev.mode === "test" && refsR.length > 0) {
102+
errors.push(`${label}: mode 'test' loads only T-codes but expected_output references ${refsR.join(", ")}`);
103+
}
104+
if (["review", "audit", "debt"].includes(ev.mode) && refsT.length > 0) {
105+
errors.push(`${label}: mode '${ev.mode}' loads only R-codes but expected_output references ${refsT.join(", ")}`);
106+
}
81107
}
82108

83109
// no_risk_codes and no_health_score are optional flags that put the live
@@ -95,18 +121,40 @@ for (const ev of evals) {
95121
}
96122
}
97123

124+
// ── Reverse coverage ───────────────────────────────────────────────────────
125+
// Every risk code must have at least one positive happy-path scenario. Skip the
126+
// false-positive (no_risk_codes) and health-score-suppression (no_health_score)
127+
// boundary scenarios — neither is a clean positive demonstration of a code.
128+
// CLAUDE.md requires "every new risk code gets paired coverage"; this enforces it
129+
// so a new code can never ship without a happy-path eval.
130+
131+
const coveredCodes = new Set();
132+
for (const ev of evals) {
133+
if (ev.no_risk_codes || ev.no_health_score) continue;
134+
if (typeof ev.expected_output !== "string") continue;
135+
for (const code of RISK_CODES) {
136+
if (ev.expected_output.includes(code)) coveredCodes.add(code);
137+
}
138+
}
139+
const uncoveredCodes = RISK_CODES.filter((code) => !coveredCodes.has(code));
140+
if (uncoveredCodes.length > 0) {
141+
errors.push(`Risk codes with no positive eval scenario: ${uncoveredCodes.join(", ")}`);
142+
}
143+
98144
// ── Report ─────────────────────────────────────────────────────────────────
99145

100-
const idCheckPass = !errors.some((e) => e.includes("expected id"));
101-
const fieldCheckPass = !errors.some((e) => e.includes("missing required field") || e.includes("is empty"));
146+
const idCheckPass = !errors.some((e) => e.includes("expected id") || e.includes("Duplicate eval id"));
147+
const fieldCheckPass = !errors.some((e) => e.includes("missing required field") || e.includes("is empty") || e.includes("'files' must"));
148+
const coherencePass = !errors.some((e) => e.includes("loads only") || e.includes("no positive eval scenario"));
102149
const riskCodePass = warnings.length === 0;
103150

104151
console.log("\nEval Suite Structural Validation");
105152
console.log("=================================");
106-
console.log(`Total scenarios : ${evals.length}`);
107-
console.log(`Sequential IDs : ${idCheckPass ? "PASS" : "FAIL"}`);
108-
console.log(`Required fields : ${fieldCheckPass ? "PASS" : "FAIL"}`);
109-
console.log(`Risk code refs : ${riskCodePass ? "PASS" : `${warnings.length} warning(s)`}`);
153+
console.log(`Total scenarios : ${evals.length}`);
154+
console.log(`Sequential IDs : ${idCheckPass ? "PASS" : "FAIL"}`);
155+
console.log(`Required fields : ${fieldCheckPass ? "PASS" : "FAIL"}`);
156+
console.log(`Mode/risk & cover : ${coherencePass ? "PASS" : "FAIL"}`);
157+
console.log(`Risk code refs : ${riskCodePass ? "PASS" : `${warnings.length} warning(s)`}`);
110158

111159
if (errors.length > 0) {
112160
console.error("\nErrors:");

scripts/validate-repo.mjs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,8 @@ const CANONICAL_INSTALL_CMD = "/plugin marketplace add hyhmrright/brooks-lint";
9999
function checkReadmeIntegrity() {
100100
const readme = readText("README.md");
101101
check(readme.includes(`version-${version}-blue.svg`), `README.md badge does not reference version ${version}`);
102+
const readmeZh = readText("README.zh-CN.md");
103+
check(readmeZh.includes(`version-${version}-blue.svg`), `README.zh-CN.md badge does not reference version ${version} (run npm run bump)`);
102104
check(readme.includes(CANONICAL_INSTALL_CMD), `README.md should contain canonical install command`);
103105
check(
104106
readme.includes(`grounded in ${sourceWord} classic engineering books`),

skills/_shared/common.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -101,14 +101,14 @@ to Flag" guards) live in `decay-risks.md`. Do not duplicate or edit diagnostic q
101101
update `decay-risks.md` directly. Book-level coverage, exceptions, and tradeoffs are in
102102
`source-coverage.md`.
103103

104-
| Risk | Diagnostic Question |
105-
|------|---------------------|
106-
| Cognitive Overload | How much mental effort to understand this? |
107-
| Change Propagation | How many unrelated things break on one change? |
108-
| Knowledge Duplication | Is the same decision expressed in multiple places? |
109-
| Accidental Complexity | Is the code more complex than the problem? |
110-
| Dependency Disorder | Do dependencies flow in a consistent direction? |
111-
| Domain Model Distortion | Does the code faithfully represent the domain? |
104+
| Code | Risk | Diagnostic Question |
105+
|------|------|---------------------|
106+
| R1 | Cognitive Overload | How much mental effort to understand this? |
107+
| R2 | Change Propagation | How many unrelated things break on one change? |
108+
| R3 | Knowledge Duplication | Is the same decision expressed in multiple places? |
109+
| R4 | Accidental Complexity | Is the code more complex than the problem? |
110+
| R5 | Dependency Disorder | Do dependencies flow in a consistent direction? |
111+
| R6 | Domain Model Distortion | Does the code faithfully represent the domain? |
112112

113113
---
114114

skills/_shared/decay-risks.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Six patterns that cause software to degrade. Apply the Iron Law to each finding.
44

55
---
66

7-
## Risk 1: Cognitive Overload
7+
## Risk 1: Cognitive Overload (R1)
88

99
**Diagnostic question:** How much mental effort does a human need to understand this?
1010

@@ -57,7 +57,7 @@ Cognitive load beyond working memory causes mistakes, avoidance, and blocks the
5757

5858
---
5959

60-
## Risk 2: Change Propagation
60+
## Risk 2: Change Propagation (R2)
6161

6262
**Diagnostic question:** How many unrelated things break when you change one thing?
6363

@@ -110,7 +110,7 @@ Each change ripples to unrelated modules, slowing velocity and multiplying regre
110110

111111
---
112112

113-
## Risk 3: Knowledge Duplication
113+
## Risk 3: Knowledge Duplication (R3)
114114

115115
**Diagnostic question:** Is the same decision expressed in more than one place?
116116

@@ -150,7 +150,7 @@ Multiple copies drift apart silently. DRY is about decisions, not code lines.
150150

151151
---
152152

153-
## Risk 4: Accidental Complexity
153+
## Risk 4: Accidental Complexity (R4)
154154

155155
**Diagnostic question:** Is the code more complex than the problem it solves?
156156

@@ -197,7 +197,7 @@ Accidental complexity accumulates addition by addition until developers fight sc
197197

198198
---
199199

200-
## Risk 5: Dependency Disorder
200+
## Risk 5: Dependency Disorder (R5)
201201

202202
**Diagnostic question:** Do dependencies flow in a consistent, predictable direction?
203203

@@ -248,7 +248,7 @@ When business logic depends on infrastructure, infrastructure changes cascade in
248248

249249
---
250250

251-
## Risk 6: Domain Model Distortion
251+
## Risk 6: Domain Model Distortion (R6)
252252

253253
**Diagnostic question:** Does the code faithfully represent the problem it is solving?
254254

skills/brooks-health/health-guide.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,16 @@ across the remaining three dimensions by dividing each remaining weight by
5555
| Debt | 0.25 | 0.25 / 0.75 = 0.33 |
5656
| Test | 0.20 | 0.20 / 0.75 = 0.27 |
5757

58+
**Score rules (must be deterministic — two runs on the same codebase must agree):**
59+
60+
- Each dimension's score is computed from the **capped** finding set shown in the dashboard
61+
(the cap at Step 1 bounds both what is displayed and what is deducted — do not deduct for
62+
findings beyond the cap).
63+
- Floor each dimension score at 0 **before** weighting.
64+
- A dimension with no findings scores **100** — it is never skipped. The **only** dimension
65+
ever omitted is PR, and only when no diff exists (its weight is then redistributed above).
66+
- Round the weighted composite to the nearest integer (half-up).
67+
5868
### Step 3: Output Dashboard
5969

6070
Use the dashboard report template below instead of the standard common.md template.

0 commit comments

Comments
 (0)