Skip to content

Commit 542ab8c

Browse files
author
fullsend-code
committed
fix(#1839): add technical doc accuracy to correctness sub-agent
The correctness sub-agent was declaring "zero correctness surface area" on documentation-only PRs, even when those documents contained implementation plans with verifiable technical claims (algorithm descriptions, pseudocode, CLI flag semantics, API behavior claims). Human reviewers on PR #1804 found 9 confirmed technical accuracy issues that the bot missed. Changes: - Updated the correctness sub-agent definition to own technical accuracy in implementation plans and design documents, with specific evaluation guidance for algorithm logic, API/library behavior claims, design document alignment, internal consistency, and edge case correctness. - Updated SKILL.md section 3b to classify docs/plans/ files and technical documentation as having correctness surface area, ensuring the correctness sub-agent is dispatched for such PRs. - Added an implementation plan example to the dispatch table. Note: make lint could not run (sandbox Go toolchain permission error unrelated to these markdown-only changes). Pre-commit encountered the same infrastructure error (exit code 3). The post-script runs authoritative pre-commit on the runner. Closes #1839 Signed-off-by: fullsend-code <fullsend-code@users.noreply.github.com>
1 parent 1088f9b commit 542ab8c

2 files changed

Lines changed: 35 additions & 2 deletions

File tree

internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,10 @@ dimensions are relevant:
212212

213213
- Any logic changes in production code, or test files are modified, or
214214
production changes lack corresponding test changes → `correctness`
215+
- Technical documentation with correctness surface area — files under
216+
`docs/plans/`, or documents containing algorithm descriptions,
217+
pseudocode, data structure definitions, CLI flag specifications, or
218+
API behavior claims → `correctness`
215219
- Changes touch auth, RBAC, permissions, secrets, data handling,
216220
string literals, config files, embedded text, or metadata →
217221
`security`
@@ -249,6 +253,7 @@ complex PR that triggers all conditions legitimately needs all 6.
249253

250254
| PR type | Agents dispatched |
251255
|--------------------------------|----------------------------------------------------------------------------------|
256+
| Implementation plan in docs/ | correctness, style-conventions, intent-coherence, docs-currency |
252257
| Typo fix in README | correctness, style-conventions |
253258
| Bug fix in auth middleware | correctness, security, style-conventions, intent-coherence |
254259
| New API endpoint with tests | correctness, security, style-conventions, cross-repo-contracts |

internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,39 @@ You are a senior software engineer reviewing for correctness.
1010

1111
**Own:** Logic errors, nil/null handling, off-by-one, edge cases, race
1212
conditions, API contract violations, error handling gaps, test adequacy
13-
(are the right behaviors tested?), and test integrity (are existing tests
14-
being weakened or poisoned alongside production changes?).
13+
(are the right behaviors tested?), test integrity (are existing tests
14+
being weakened or poisoned alongside production changes?), and technical
15+
accuracy in implementation plans and design documents.
1516

1617
**Do not own:** Naming style, doc staleness, PR scope, injection defense.
1718

1819
When evaluating tests, check git history of modified test files for
1920
assertion loosening or coverage reduction that coincides with production
2021
changes — this is a security-adjacent concern (split-payload pattern).
22+
23+
### Technical documentation with correctness surface area
24+
25+
Not all documentation is prose. Files under `docs/plans/`, and any
26+
document containing algorithm descriptions, pseudocode, data structure
27+
definitions, type specifications, CLI flag semantics, or API behavior
28+
claims, have **correctness surface area** — even when no production code
29+
is changed. Do NOT short-circuit with "zero correctness surface area"
30+
when the diff contains such content.
31+
32+
When reviewing technical documentation, verify:
33+
34+
- **Algorithm logic consistency** — Are described algorithms internally
35+
consistent? Do they correctly handle edge cases they claim to handle
36+
(e.g., DAG diamond patterns vs cycles, empty inputs, boundary values)?
37+
- **API and library behavior claims** — Are statements about how
38+
libraries, APIs, or language features behave actually correct?
39+
Cross-check against known behavior.
40+
- **Design document alignment** — If the plan references a design
41+
document or ADR, are the claims consistent with the referenced source?
42+
Flag contradictions.
43+
- **Internal consistency** — Does the document contradict itself? For
44+
example, does one section define a sentinel value as "unlimited" while
45+
another treats it as "disabled"?
46+
- **Edge case correctness** — Are described edge cases (depth/breadth
47+
limits, zero values, error conditions) handled correctly in the
48+
described logic?

0 commit comments

Comments
 (0)