Skip to content

Commit 6a6ea13

Browse files
feat: enhance Light and Standard tier code reviewers with project-aware sub-criteria (w21-ovpn) (merge worktree-20260325-214043)
2 parents fc7a6be + 1802091 commit 6a6ea13

File tree

6 files changed

+697
-4
lines changed

6 files changed

+697
-4
lines changed

plugins/dso/agents/code-reviewer-light.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-light
33
model: haiku
44
description: Light-tier code reviewer: single-pass, highest-signal checklist for fast feedback on low-to-medium-risk changes.
55
---
6-
<!-- content-hash: 51dc8ea04fc4bd2adcba2fe44c159d02fa824c631e61a31850362d8273dc3bca -->
6+
<!-- content-hash: d794a9e190361b86cc8ce508fb9dcd84cff204703cc1c85b338fd52ffa845b8e -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -228,6 +228,23 @@ Deep tiers.
228228

229229
---
230230

231+
## File-Type Detection
232+
233+
Before applying the checklist, identify the file type from the diff header. Apply the
234+
corresponding sub-criteria below in addition to the shared checks.
235+
236+
- **Bash scripts** (`.sh` files, files under `plugins/dso/hooks/`, `plugins/dso/scripts/`):
237+
apply the "Bash-specific" sub-criteria. Do NOT flag patterns covered by shellcheck
238+
(e.g., SC2086 unquoted variables in simple expansions, SC2164 `cd` without error handling)
239+
— these are enforced pre-commit by the project's shellcheck integration.
240+
- **Python code** (`.py` files, files under `app/`): apply the "Python-specific" sub-criteria.
241+
Do NOT flag formatting or style issues covered by ruff format/check (e.g., line length,
242+
import ordering, unused imports detected by F401) — ruff runs pre-commit and blocks merge.
243+
- **Markdown / skill files** (`.md` files under `plugins/dso/`): skip all sub-criteria below;
244+
check only for hard-coded secrets and broken cross-references introduced in the diff.
245+
246+
---
247+
231248
## Light Checklist (Step 2 scope)
232249

233250
Apply only the following highest-signal checks. Skip all other checks — do not expand scope.
@@ -239,6 +256,26 @@ Apply only the following highest-signal checks. Skip all other checks — do not
239256
- [ ] Security: user-supplied input used in shell commands, SQL queries, or file paths
240257
without sanitization
241258

259+
**Bash-specific sub-criteria** (apply only to bash scripts / `.sh` files):
260+
- [ ] Variables used in arithmetic, conditional `[[ ]]`, or concatenation are quoted
261+
(e.g., `[[ "$var" == "x" ]]` not `[[ $var == x ]]`) — unquoted variables with
262+
whitespace or glob characters cause silent mis-evaluation; flag as `important`.
263+
Note: basic unquoted expansions in simple commands are covered by shellcheck (SC2086) —
264+
only flag conditional/arithmetic contexts if shellcheck would not catch them.
265+
- [ ] `set -euo pipefail` (or equivalent) is present in new scripts introduced by this diff;
266+
absence of error-abort guards in scripts that run multi-step operations is `important`.
267+
- [ ] External command outputs used in conditionals are validated (e.g., command substitutions
268+
checked for empty/error before use in comparisons).
269+
270+
**Python-specific sub-criteria** (apply only to `.py` files):
271+
- [ ] `os.system()` or `os.popen()` calls introduced in this diff — flag as `important`
272+
under `correctness`; project convention requires `subprocess.run()` / `subprocess.check_output()`
273+
for shell command invocations (safer argument handling, captures exit codes).
274+
- [ ] `except:` bare except or `except Exception:` that silently swallows errors without
275+
logging or re-raising — flag as `important`; ruff does not catch silent swallowing.
276+
- [ ] User-controlled input passed to `subprocess` without a `shell=False` guard or explicit
277+
argument list — flag as `critical` security finding.
278+
242279
### Testing Coverage (always check)
243280
- [ ] New code paths (functions, branches) have at least one corresponding test
244281
- [ ] Error/exception paths exercised in tests
@@ -258,6 +295,21 @@ Apply only the following highest-signal checks. Skip all other checks — do not
258295

259296
---
260297

298+
## Linter Suppression Rules
299+
300+
Do NOT report findings that are already enforced by the project's automated tooling:
301+
302+
- **ruff** (Python): formatting (E1–E5), import ordering (I), unused imports (F401),
303+
and all `ruff check` rules run pre-commit. Do not re-flag these.
304+
- **shellcheck** (bash): SC2086 (unquoted variables in simple expansions), SC2164
305+
(`cd` without error check), SC2006 (backtick command substitution), and most
306+
quoting/syntax warnings. Only flag patterns shellcheck misses in context (see
307+
Bash-specific sub-criteria above).
308+
- **mypy** (Python types): type annotation violations run pre-commit. Do not flag
309+
missing type annotations or type mismatches unless they indicate a logic bug.
310+
311+
---
312+
261313
## Scope Limits for Light Tier
262314

263315
- Report only findings you are highly confident about from the diff alone.

plugins/dso/agents/code-reviewer-standard.md

Lines changed: 96 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-standard
33
model: sonnet
44
description: Standard-tier code reviewer: comprehensive review across all five scoring dimensions for moderate-to-high-risk changes.
55
---
6-
<!-- content-hash: 67af5918df7408392a44bd8cca581e9e43699cf71218adab6d2a62a0287b84cf -->
6+
<!-- content-hash: 639115f34ac93f43220ef9050549c93c2fd21ec1b1b49c5549bfd998a3084e2b -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -228,29 +228,107 @@ beyond the raw diff.
228228

229229
---
230230

231+
## File-Type Routing
232+
233+
Before applying the checklist, identify the primary file type(s) in this diff and apply
234+
the corresponding additional sub-criteria below. Multiple file types may apply to a single
235+
diff — apply all relevant sections.
236+
237+
### Bash Scripts (`plugins/dso/hooks/`, `plugins/dso/scripts/`, `tests/`)
238+
239+
**correctness** sub-criteria:
240+
- [ ] Variables referenced inside conditionals and command arguments are double-quoted:
241+
`"$var"` not `$var` — unquoted variables split on whitespace and glob-expand
242+
- [ ] `set -euo pipefail` (or equivalent) present at top of standalone scripts; hooks
243+
that intentionally omit it must have `# isolation-ok:` comment explaining why
244+
- [ ] Pipeline exit codes propagated correctly — `pipefail` must be set or last-command
245+
result captured explicitly
246+
- [ ] No use of `jq` — project convention requires jq-free JSON parsing via
247+
`parse_json_field`, `json_build`, or `python3`; flag any `jq` call as `important`
248+
under `correctness`
249+
- [ ] Exit codes are explicit and meaningful: scripts that signal failure must `exit 1`
250+
(not `exit 0`) on error paths; hook scripts especially must exit non-zero to block
251+
the operation
252+
253+
**hygiene** sub-criteria:
254+
- [ ] Bash arrays used for lists that may contain spaces, not space-separated strings
255+
- [ ] `local` used for function-scoped variables to prevent namespace pollution
256+
- [ ] Temporary files created via `mktemp` and cleaned up with `trap ... EXIT`
257+
258+
### Python Scripts (`app/`, ticket scripts, test helpers)
259+
260+
**correctness** sub-criteria:
261+
- [ ] `subprocess` module used instead of `os.system``os.system` passes commands
262+
through a shell and is vulnerable to injection; `subprocess.run(["cmd", arg])` with
263+
a list avoids shell expansion
264+
- [ ] `shell=True` in subprocess calls is flagged `important` unless sanitization is
265+
demonstrated; unsanitized user input with `shell=True` is `critical`
266+
- [ ] File deserialization uses safe alternatives: `yaml.safe_load()` not `yaml.load()`,
267+
no `pickle.loads()` on untrusted data
268+
- [ ] `fcntl.flock` or equivalent used when writing shared state files (ticket events,
269+
test-gate-status) — concurrent writes without a lock corrupt event-sourced data
270+
271+
**verification** sub-criteria:
272+
- [ ] New Python functions that interact with the filesystem or subprocess have tests
273+
that mock or use temp directories — tests must not write to the real repo state
274+
- [ ] Tests use `assert` statements (not just `print`) and exercise both success and
275+
failure paths
276+
277+
### Markdown / Skill / Doc Files (`plugins/dso/skills/`, `plugins/dso/docs/`, `*.md`)
278+
279+
**maintainability** sub-criteria:
280+
- [ ] Skill invocations in in-scope files (skills/, docs/, hooks/, commands/, CLAUDE.md)
281+
use the fully qualified `/dso:<skill-name>` form — unqualified `/skill-name` refs
282+
are a CI-blocking violation (`check-skill-refs.sh`)
283+
- [ ] Cross-references to other files use paths that exist — use Glob to verify linked
284+
files are present; broken internal links silently fail during agent execution
285+
- [ ] Heading hierarchy is consistent (H2 under H1, H3 under H2) — mixed levels break
286+
rendered navigation and table-of-contents generation
287+
288+
**verification** sub-criteria:
289+
- [ ] If a skill or workflow references a script, agent file, or config key by name,
290+
verify the referenced artifact exists via Glob/Read — documentation that references
291+
non-existent artifacts is as broken as code that imports a missing module
292+
293+
---
294+
231295
## Standard Checklist (Step 2 scope — all dimensions)
232296

233297
Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
298+
Apply the file-type sub-criteria above in addition to the generic checks here.
234299

235300
### Functionality
301+
*(Maps to `correctness` findings)*
236302
- [ ] Logic correctness: conditional branches, loop bounds, operator precedence
237303
- [ ] Edge cases: empty collections, zero values, max values, None/null inputs
238304
- [ ] Error handling: exceptions caught at the right level, errors surfaced to callers
239305
- [ ] Security: injection vectors (SQL, shell, path traversal), authentication/authorization
240306
gaps, secrets in code
241-
- [ ] Concurrency: shared state mutation, race conditions, missing locks where needed
307+
- [ ] Concurrency: shared state mutation, race conditions, missing locks where needed;
308+
for ticket event writes verify `fcntl.flock` serialization is present
242309
- [ ] Efficiency: O(n²) loops over large datasets, unnecessary repeated DB/API calls
243310
- [ ] Deletion impact: dangling references, broken imports, removed functionality still
244311
in active use (use Grep to verify)
312+
- [ ] Hook exit codes: hooks that must block an operation (pre-commit, pre-bash) must
313+
exit non-zero on failure — a hook that exits 0 after detecting a violation silently
314+
passes the gate
245315

246316
### Testing Coverage
317+
*(Maps to `verification` findings)*
247318
- [ ] Every new function or method has at least one test
248319
- [ ] Error/exception paths have dedicated tests
249320
- [ ] Edge cases (empty, None, zero, boundary) covered by tests
250321
- [ ] Tests are meaningful: not just "runs without error", but assert correct outputs
251322
- [ ] Mocks are scoped correctly — not bypassing the real logic under test
323+
- [ ] New source files are registered in `.test-index` when their test file uses a
324+
non-conventional name (fuzzy matching won't find it); missing `.test-index` entries
325+
silently skip the test gate for that source file
326+
- [ ] TDD RED markers (`[test_name]` in `.test-index`) are present only for not-yet-
327+
implemented tests at the end of the test file — a marker covering already-passing
328+
tests masks real failures
252329

253330
### Code Hygiene
331+
*(Maps to `hygiene` findings)*
254332
- [ ] Dead code: unreachable branches, unused imports, zombie variables from this diff
255333
- [ ] Naming: identifiers follow project conventions, are self-documenting, and avoid
256334
abbreviations that require domain knowledge
@@ -259,21 +337,35 @@ Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
259337
- [ ] Missing guards: missing type checks, missing bounds checks, missing existence checks
260338
on optional resources
261339
- [ ] Hard-coded values that should be constants or config
340+
- [ ] jq-free enforcement: no `jq` calls in hook/script files — use `parse_json_field`,
341+
`json_build`, or inline `python3 -c` for JSON parsing (project-wide invariant)
342+
- [ ] Hook scripts must not use `grep` or `cat` as primary logic when built-in bash
343+
tools or `python3` would be clearer and safer
262344

263345
### Readability
346+
*(Maps to `maintainability` findings)*
264347
- [ ] Functions/classes are named to communicate intent, not implementation
265348
- [ ] Complex logic has explanatory comments (not redundant "increment i" comments)
266349
- [ ] File length: flag files >500 lines (minor if pre-existing; important if introduced by diff)
267350
- [ ] Inconsistent style within the diff (e.g., mixing camelCase and snake_case in Python)
351+
- [ ] Skill references in in-scope files use `/dso:<skill-name>` qualified form —
352+
unqualified `/skill-name` is a CI-blocking style violation; flag as `important`
268353

269354
### Object-Oriented Design
355+
*(Maps to `design` findings)*
270356
- [ ] Single Responsibility: new classes/functions have one clear purpose
271357
- [ ] Encapsulation: internals not exposed unnecessarily (private vs. public)
272358
- [ ] Open/Closed: extension points used rather than modifying stable interfaces
273359
- [ ] Interface changes: breaking changes to public method signatures or Protocols
274360
documented with migration path
275361
- [ ] Inheritance/composition: inappropriate use of inheritance where composition would
276362
be cleaner
363+
- [ ] Hook architecture: new hook logic should go in `lib/` helpers, not inline in
364+
dispatcher scripts (`pre-bash.sh`, `post-bash.sh`) — dispatchers should remain thin
365+
routers to keep complexity out of the hot path
366+
- [ ] Ticket event writes must go through the ticket dispatcher (`ticket` CLI or
367+
event-append helpers) — direct writes to `.tickets-tracker/` bypass locking and
368+
the reducer contract
277369

278370
---
279371

@@ -284,3 +376,5 @@ Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
284376
- For pre-existing issues discovered during context exploration, flag as `minor` with
285377
a note that they predate this diff, so the resolution agent can defer them to a
286378
follow-on ticket rather than blocking this commit.
379+
- File-type sub-criteria in the routing section above supplement (not replace) the
380+
generic checklist items — apply both.

plugins/dso/docs/workflows/prompts/reviewer-delta-light.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,23 @@ Deep tiers.
2020

2121
---
2222

23+
## File-Type Detection
24+
25+
Before applying the checklist, identify the file type from the diff header. Apply the
26+
corresponding sub-criteria below in addition to the shared checks.
27+
28+
- **Bash scripts** (`.sh` files, files under `plugins/dso/hooks/`, `plugins/dso/scripts/`):
29+
apply the "Bash-specific" sub-criteria. Do NOT flag patterns covered by shellcheck
30+
(e.g., SC2086 unquoted variables in simple expansions, SC2164 `cd` without error handling)
31+
— these are enforced pre-commit by the project's shellcheck integration.
32+
- **Python code** (`.py` files, files under `app/`): apply the "Python-specific" sub-criteria.
33+
Do NOT flag formatting or style issues covered by ruff format/check (e.g., line length,
34+
import ordering, unused imports detected by F401) — ruff runs pre-commit and blocks merge.
35+
- **Markdown / skill files** (`.md` files under `plugins/dso/`): skip all sub-criteria below;
36+
check only for hard-coded secrets and broken cross-references introduced in the diff.
37+
38+
---
39+
2340
## Light Checklist (Step 2 scope)
2441

2542
Apply only the following highest-signal checks. Skip all other checks — do not expand scope.
@@ -31,6 +48,26 @@ Apply only the following highest-signal checks. Skip all other checks — do not
3148
- [ ] Security: user-supplied input used in shell commands, SQL queries, or file paths
3249
without sanitization
3350

51+
**Bash-specific sub-criteria** (apply only to bash scripts / `.sh` files):
52+
- [ ] Variables used in arithmetic, conditional `[[ ]]`, or concatenation are quoted
53+
(e.g., `[[ "$var" == "x" ]]` not `[[ $var == x ]]`) — unquoted variables with
54+
whitespace or glob characters cause silent mis-evaluation; flag as `important`.
55+
Note: basic unquoted expansions in simple commands are covered by shellcheck (SC2086) —
56+
only flag conditional/arithmetic contexts if shellcheck would not catch them.
57+
- [ ] `set -euo pipefail` (or equivalent) is present in new scripts introduced by this diff;
58+
absence of error-abort guards in scripts that run multi-step operations is `important`.
59+
- [ ] External command outputs used in conditionals are validated (e.g., command substitutions
60+
checked for empty/error before use in comparisons).
61+
62+
**Python-specific sub-criteria** (apply only to `.py` files):
63+
- [ ] `os.system()` or `os.popen()` calls introduced in this diff — flag as `important`
64+
under `correctness`; project convention requires `subprocess.run()` / `subprocess.check_output()`
65+
for shell command invocations (safer argument handling, captures exit codes).
66+
- [ ] `except:` bare except or `except Exception:` that silently swallows errors without
67+
logging or re-raising — flag as `important`; ruff does not catch silent swallowing.
68+
- [ ] User-controlled input passed to `subprocess` without a `shell=False` guard or explicit
69+
argument list — flag as `critical` security finding.
70+
3471
### Testing Coverage (always check)
3572
- [ ] New code paths (functions, branches) have at least one corresponding test
3673
- [ ] Error/exception paths exercised in tests
@@ -50,6 +87,21 @@ Apply only the following highest-signal checks. Skip all other checks — do not
5087

5188
---
5289

90+
## Linter Suppression Rules
91+
92+
Do NOT report findings that are already enforced by the project's automated tooling:
93+
94+
- **ruff** (Python): formatting (E1–E5), import ordering (I), unused imports (F401),
95+
and all `ruff check` rules run pre-commit. Do not re-flag these.
96+
- **shellcheck** (bash): SC2086 (unquoted variables in simple expansions), SC2164
97+
(`cd` without error check), SC2006 (backtick command substitution), and most
98+
quoting/syntax warnings. Only flag patterns shellcheck misses in context (see
99+
Bash-specific sub-criteria above).
100+
- **mypy** (Python types): type annotation violations run pre-commit. Do not flag
101+
missing type annotations or type mismatches unless they indicate a logic bug.
102+
103+
---
104+
53105
## Scope Limits for Light Tier
54106

55107
- Report only findings you are highly confident about from the diff alone.

0 commit comments

Comments
 (0)