fix(#1839): add technical doc accuracy to correctness sub-agent

fullsend-code · fullsend-code · commit 542ab8c498a4 · 2026-06-03T14:40:23.000Z
The correctness sub-agent was declaring "zero correctness surface area" on documentation-only PRs, even when those documents contained implementation plans with verifiable technical claims (algorithm descriptions, pseudocode, CLI flag semantics, API behavior claims). Human reviewers on PR #1804 found 9 confirmed technical accuracy issues that the bot missed. Changes: - Updated the correctness sub-agent definition to own technical accuracy in implementation plans and design documents, with specific evaluation guidance for algorithm logic, API/library behavior claims, design document alignment, internal consistency, and edge case correctness. - Updated SKILL.md section 3b to classify docs/plans/ files and technical documentation as having correctness surface area, ensuring the correctness sub-agent is dispatched for such PRs. - Added an implementation plan example to the dispatch table. Note: make lint could not run (sandbox Go toolchain permission error unrelated to these markdown-only changes). Pre-commit encountered the same infrastructure error (exit code 3). The post-script runs authoritative pre-commit on the runner. Closes #1839 Signed-off-by: fullsend-code <fullsend-code@users.noreply.github.com>
diff --git a/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md b/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md
@@ -212,6 +212,10 @@ dimensions are relevant:
 
 - Any logic changes in production code, or test files are modified, or
   production changes lack corresponding test changes → `correctness`
+- Technical documentation with correctness surface area — files under
+  `docs/plans/`, or documents containing algorithm descriptions,
+  pseudocode, data structure definitions, CLI flag specifications, or
+  API behavior claims → `correctness`
 - Changes touch auth, RBAC, permissions, secrets, data handling,
   string literals, config files, embedded text, or metadata →
   `security`
@@ -249,6 +253,7 @@ complex PR that triggers all conditions legitimately needs all 6.
 
 | PR type                        | Agents dispatched                                                                |
 |--------------------------------|----------------------------------------------------------------------------------|
+| Implementation plan in docs/   | correctness, style-conventions, intent-coherence, docs-currency                  |
 | Typo fix in README             | correctness, style-conventions                                                   |
 | Bug fix in auth middleware     | correctness, security, style-conventions, intent-coherence                       |
 | New API endpoint with tests    | correctness, security, style-conventions, cross-repo-contracts                   |
diff --git a/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md b/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md
@@ -10,11 +10,39 @@ You are a senior software engineer reviewing for correctness.
 
 **Own:** Logic errors, nil/null handling, off-by-one, edge cases, race
 conditions, API contract violations, error handling gaps, test adequacy
-(are the right behaviors tested?), and test integrity (are existing tests
-being weakened or poisoned alongside production changes?).
+(are the right behaviors tested?), test integrity (are existing tests
+being weakened or poisoned alongside production changes?), and technical
+accuracy in implementation plans and design documents.
 
 **Do not own:** Naming style, doc staleness, PR scope, injection defense.
 
 When evaluating tests, check git history of modified test files for
 assertion loosening or coverage reduction that coincides with production
 changes — this is a security-adjacent concern (split-payload pattern).
+
+### Technical documentation with correctness surface area
+
+Not all documentation is prose. Files under `docs/plans/`, and any
+document containing algorithm descriptions, pseudocode, data structure
+definitions, type specifications, CLI flag semantics, or API behavior
+claims, have **correctness surface area** — even when no production code
+is changed. Do NOT short-circuit with "zero correctness surface area"
+when the diff contains such content.
+
+When reviewing technical documentation, verify:
+
+- **Algorithm logic consistency** — Are described algorithms internally
+  consistent? Do they correctly handle edge cases they claim to handle
+  (e.g., DAG diamond patterns vs cycles, empty inputs, boundary values)?
+- **API and library behavior claims** — Are statements about how
+  libraries, APIs, or language features behave actually correct?
+  Cross-check against known behavior.
+- **Design document alignment** — If the plan references a design
+  document or ADR, are the claims consistent with the referenced source?
+  Flag contradictions.
+- **Internal consistency** — Does the document contradict itself? For
+  example, does one section define a sentinel value as "unlimited" while
+  another treats it as "disabled"?
+- **Edge case correctness** — Are described edge cases (depth/breadth
+  limits, zero values, error conditions) handled correctly in the
+  described logic?