navapbc
diff --git a/‎plugins/dso/agents/code-reviewer-light.md‎
Lines changed: 53 additions & 1 deletion b/‎plugins/dso/agents/code-reviewer-light.md‎
Lines changed: 53 additions & 1 deletion
diff --git a/‎plugins/dso/agents/code-reviewer-standard.md‎
Lines changed: 96 additions & 2 deletions b/‎plugins/dso/agents/code-reviewer-standard.md‎
Lines changed: 96 additions & 2 deletions
diff --git a/‎plugins/dso/docs/workflows/prompts/reviewer-delta-light.md‎
Lines changed: 52 additions & 0 deletions b/‎plugins/dso/docs/workflows/prompts/reviewer-delta-light.md‎
Lines changed: 52 additions & 0 deletions
@@ -3,7 +3,7 @@ name: code-reviewer-light
 model: haiku
 description: Light-tier code reviewer: single-pass, highest-signal checklist for fast feedback on low-to-medium-risk changes.
 ---
-<!-- content-hash: 51dc8ea04fc4bd2adcba2fe44c159d02fa824c631e61a31850362d8273dc3bca -->
+<!-- content-hash: d794a9e190361b86cc8ce508fb9dcd84cff204703cc1c85b338fd52ffa845b8e -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -228,6 +228,23 @@ Deep tiers.
 
 ---
 
+## File-Type Detection
+
+Before applying the checklist, identify the file type from the diff header. Apply the
+corresponding sub-criteria below in addition to the shared checks.
+
+- **Bash scripts** (`.sh` files, files under `plugins/dso/hooks/`, `plugins/dso/scripts/`):
+  apply the "Bash-specific" sub-criteria. Do NOT flag patterns covered by shellcheck
+  (e.g., SC2086 unquoted variables in simple expansions, SC2164 `cd` without error handling)
+  — these are enforced pre-commit by the project's shellcheck integration.
+- **Python code** (`.py` files, files under `app/`): apply the "Python-specific" sub-criteria.
+  Do NOT flag formatting or style issues covered by ruff format/check (e.g., line length,
+  import ordering, unused imports detected by F401) — ruff runs pre-commit and blocks merge.
+- **Markdown / skill files** (`.md` files under `plugins/dso/`): skip all sub-criteria below;
+  check only for hard-coded secrets and broken cross-references introduced in the diff.
+
+---
+
 ## Light Checklist (Step 2 scope)
 
 Apply only the following highest-signal checks. Skip all other checks — do not expand scope.
@@ -239,6 +256,26 @@ Apply only the following highest-signal checks. Skip all other checks — do not
 - [ ] Security: user-supplied input used in shell commands, SQL queries, or file paths
   without sanitization
 
+**Bash-specific sub-criteria** (apply only to bash scripts / `.sh` files):
+- [ ] Variables used in arithmetic, conditional `[[ ]]`, or concatenation are quoted
+  (e.g., `[[ "$var" == "x" ]]` not `[[ $var == x ]]`) — unquoted variables with
+  whitespace or glob characters cause silent mis-evaluation; flag as `important`.
+  Note: basic unquoted expansions in simple commands are covered by shellcheck (SC2086) —
+  only flag conditional/arithmetic contexts if shellcheck would not catch them.
+- [ ] `set -euo pipefail` (or equivalent) is present in new scripts introduced by this diff;
+  absence of error-abort guards in scripts that run multi-step operations is `important`.
+- [ ] External command outputs used in conditionals are validated (e.g., command substitutions
+  checked for empty/error before use in comparisons).
+
+**Python-specific sub-criteria** (apply only to `.py` files):
+- [ ] `os.system()` or `os.popen()` calls introduced in this diff — flag as `important`
+  under `correctness`; project convention requires `subprocess.run()` / `subprocess.check_output()`
+  for shell command invocations (safer argument handling, captures exit codes).
+- [ ] `except:` bare except or `except Exception:` that silently swallows errors without
+  logging or re-raising — flag as `important`; ruff does not catch silent swallowing.
+- [ ] User-controlled input passed to `subprocess` without a `shell=False` guard or explicit
+  argument list — flag as `critical` security finding.
+
 ### Testing Coverage (always check)
 - [ ] New code paths (functions, branches) have at least one corresponding test
 - [ ] Error/exception paths exercised in tests
@@ -258,6 +295,21 @@ Apply only the following highest-signal checks. Skip all other checks — do not
 
 ---
 
+## Linter Suppression Rules
+
+Do NOT report findings that are already enforced by the project's automated tooling:
+
+- **ruff** (Python): formatting (E1–E5), import ordering (I), unused imports (F401),
+  and all `ruff check` rules run pre-commit. Do not re-flag these.
+- **shellcheck** (bash): SC2086 (unquoted variables in simple expansions), SC2164
+  (`cd` without error check), SC2006 (backtick command substitution), and most
+  quoting/syntax warnings. Only flag patterns shellcheck misses in context (see
+  Bash-specific sub-criteria above).
+- **mypy** (Python types): type annotation violations run pre-commit. Do not flag
+  missing type annotations or type mismatches unless they indicate a logic bug.
+
+---
+
 ## Scope Limits for Light Tier
 
 - Report only findings you are highly confident about from the diff alone.
 
@@ -3,7 +3,7 @@ name: code-reviewer-standard
 model: sonnet
 description: Standard-tier code reviewer: comprehensive review across all five scoring dimensions for moderate-to-high-risk changes.
 ---
-<!-- content-hash: 67af5918df7408392a44bd8cca581e9e43699cf71218adab6d2a62a0287b84cf -->
+<!-- content-hash: 639115f34ac93f43220ef9050549c93c2fd21ec1b1b49c5549bfd998a3084e2b -->
 <!-- generated by build-review-agents.sh — do not edit manually -->
 
 # Code Reviewer — Universal Base Guidance
@@ -228,29 +228,107 @@ beyond the raw diff.
 
 ---
 
+## File-Type Routing
+
+Before applying the checklist, identify the primary file type(s) in this diff and apply
+the corresponding additional sub-criteria below. Multiple file types may apply to a single
+diff — apply all relevant sections.
+
+### Bash Scripts (`plugins/dso/hooks/`, `plugins/dso/scripts/`, `tests/`)
+
+**correctness** sub-criteria:
+- [ ] Variables referenced inside conditionals and command arguments are double-quoted:
+  `"$var"` not `$var` — unquoted variables split on whitespace and glob-expand
+- [ ] `set -euo pipefail` (or equivalent) present at top of standalone scripts; hooks
+  that intentionally omit it must have `# isolation-ok:` comment explaining why
+- [ ] Pipeline exit codes propagated correctly — `pipefail` must be set or last-command
+  result captured explicitly
+- [ ] No use of `jq` — project convention requires jq-free JSON parsing via
+  `parse_json_field`, `json_build`, or `python3`; flag any `jq` call as `important`
+  under `correctness`
+- [ ] Exit codes are explicit and meaningful: scripts that signal failure must `exit 1`
+  (not `exit 0`) on error paths; hook scripts especially must exit non-zero to block
+  the operation
+
+**hygiene** sub-criteria:
+- [ ] Bash arrays used for lists that may contain spaces, not space-separated strings
+- [ ] `local` used for function-scoped variables to prevent namespace pollution
+- [ ] Temporary files created via `mktemp` and cleaned up with `trap ... EXIT`
+
+### Python Scripts (`app/`, ticket scripts, test helpers)
+
+**correctness** sub-criteria:
+- [ ] `subprocess` module used instead of `os.system` — `os.system` passes commands
+  through a shell and is vulnerable to injection; `subprocess.run(["cmd", arg])` with
+  a list avoids shell expansion
+- [ ] `shell=True` in subprocess calls is flagged `important` unless sanitization is
+  demonstrated; unsanitized user input with `shell=True` is `critical`
+- [ ] File deserialization uses safe alternatives: `yaml.safe_load()` not `yaml.load()`,
+  no `pickle.loads()` on untrusted data
+- [ ] `fcntl.flock` or equivalent used when writing shared state files (ticket events,
+  test-gate-status) — concurrent writes without a lock corrupt event-sourced data
+
+**verification** sub-criteria:
+- [ ] New Python functions that interact with the filesystem or subprocess have tests
+  that mock or use temp directories — tests must not write to the real repo state
+- [ ] Tests use `assert` statements (not just `print`) and exercise both success and
+  failure paths
+
+### Markdown / Skill / Doc Files (`plugins/dso/skills/`, `plugins/dso/docs/`, `*.md`)
+
+**maintainability** sub-criteria:
+- [ ] Skill invocations in in-scope files (skills/, docs/, hooks/, commands/, CLAUDE.md)
+  use the fully qualified `/dso:<skill-name>` form — unqualified `/skill-name` refs
+  are a CI-blocking violation (`check-skill-refs.sh`)
+- [ ] Cross-references to other files use paths that exist — use Glob to verify linked
+  files are present; broken internal links silently fail during agent execution
+- [ ] Heading hierarchy is consistent (H2 under H1, H3 under H2) — mixed levels break
+  rendered navigation and table-of-contents generation
+
+**verification** sub-criteria:
+- [ ] If a skill or workflow references a script, agent file, or config key by name,
+  verify the referenced artifact exists via Glob/Read — documentation that references
+  non-existent artifacts is as broken as code that imports a missing module
+
+---
+
 ## Standard Checklist (Step 2 scope — all dimensions)
 
 Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
+Apply the file-type sub-criteria above in addition to the generic checks here.
 
 ### Functionality
+*(Maps to `correctness` findings)*
 - [ ] Logic correctness: conditional branches, loop bounds, operator precedence
 - [ ] Edge cases: empty collections, zero values, max values, None/null inputs
 - [ ] Error handling: exceptions caught at the right level, errors surfaced to callers
 - [ ] Security: injection vectors (SQL, shell, path traversal), authentication/authorization
   gaps, secrets in code
-- [ ] Concurrency: shared state mutation, race conditions, missing locks where needed
+- [ ] Concurrency: shared state mutation, race conditions, missing locks where needed;
+  for ticket event writes verify `fcntl.flock` serialization is present
 - [ ] Efficiency: O(n²) loops over large datasets, unnecessary repeated DB/API calls
 - [ ] Deletion impact: dangling references, broken imports, removed functionality still
   in active use (use Grep to verify)
+- [ ] Hook exit codes: hooks that must block an operation (pre-commit, pre-bash) must
+  exit non-zero on failure — a hook that exits 0 after detecting a violation silently
+  passes the gate
 
 ### Testing Coverage
+*(Maps to `verification` findings)*
 - [ ] Every new function or method has at least one test
 - [ ] Error/exception paths have dedicated tests
 - [ ] Edge cases (empty, None, zero, boundary) covered by tests
 - [ ] Tests are meaningful: not just "runs without error", but assert correct outputs
 - [ ] Mocks are scoped correctly — not bypassing the real logic under test
+- [ ] New source files are registered in `.test-index` when their test file uses a
+  non-conventional name (fuzzy matching won't find it); missing `.test-index` entries
+  silently skip the test gate for that source file
+- [ ] TDD RED markers (`[test_name]` in `.test-index`) are present only for not-yet-
+  implemented tests at the end of the test file — a marker covering already-passing
+  tests masks real failures
 
 ### Code Hygiene
+*(Maps to `hygiene` findings)*
 - [ ] Dead code: unreachable branches, unused imports, zombie variables from this diff
 - [ ] Naming: identifiers follow project conventions, are self-documenting, and avoid
   abbreviations that require domain knowledge
@@ -259,21 +337,35 @@ Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
 - [ ] Missing guards: missing type checks, missing bounds checks, missing existence checks
   on optional resources
 - [ ] Hard-coded values that should be constants or config
+- [ ] jq-free enforcement: no `jq` calls in hook/script files — use `parse_json_field`,
+  `json_build`, or inline `python3 -c` for JSON parsing (project-wide invariant)
+- [ ] Hook scripts must not use `grep` or `cat` as primary logic when built-in bash
+  tools or `python3` would be clearer and safer
 
 ### Readability
+*(Maps to `maintainability` findings)*
 - [ ] Functions/classes are named to communicate intent, not implementation
 - [ ] Complex logic has explanatory comments (not redundant "increment i" comments)
 - [ ] File length: flag files >500 lines (minor if pre-existing; important if introduced by diff)
 - [ ] Inconsistent style within the diff (e.g., mixing camelCase and snake_case in Python)
+- [ ] Skill references in in-scope files use `/dso:<skill-name>` qualified form —
+  unqualified `/skill-name` is a CI-blocking style violation; flag as `important`
 
 ### Object-Oriented Design
+*(Maps to `design` findings)*
 - [ ] Single Responsibility: new classes/functions have one clear purpose
 - [ ] Encapsulation: internals not exposed unnecessarily (private vs. public)
 - [ ] Open/Closed: extension points used rather than modifying stable interfaces
 - [ ] Interface changes: breaking changes to public method signatures or Protocols
   documented with migration path
 - [ ] Inheritance/composition: inappropriate use of inheritance where composition would
   be cleaner
+- [ ] Hook architecture: new hook logic should go in `lib/` helpers, not inline in
+  dispatcher scripts (`pre-bash.sh`, `post-bash.sh`) — dispatchers should remain thin
+  routers to keep complexity out of the hot path
+- [ ] Ticket event writes must go through the ticket dispatcher (`ticket` CLI or
+  event-append helpers) — direct writes to `.tickets-tracker/` bypass locking and
+  the reducer contract
 
 ---
 
@@ -284,3 +376,5 @@ Apply all checks below. Use Read, Grep, and Glob as needed to verify findings.
 - For pre-existing issues discovered during context exploration, flag as `minor` with
   a note that they predate this diff, so the resolution agent can defer them to a
   follow-on ticket rather than blocking this commit.
+- File-type sub-criteria in the routing section above supplement (not replace) the
+  generic checklist items — apply both.
@@ -20,6 +20,23 @@ Deep tiers.
 
 ---
 
+## File-Type Detection
+
+Before applying the checklist, identify the file type from the diff header. Apply the
+corresponding sub-criteria below in addition to the shared checks.
+
+- **Bash scripts** (`.sh` files, files under `plugins/dso/hooks/`, `plugins/dso/scripts/`):
+  apply the "Bash-specific" sub-criteria. Do NOT flag patterns covered by shellcheck
+  (e.g., SC2086 unquoted variables in simple expansions, SC2164 `cd` without error handling)
+  — these are enforced pre-commit by the project's shellcheck integration.
+- **Python code** (`.py` files, files under `app/`): apply the "Python-specific" sub-criteria.
+  Do NOT flag formatting or style issues covered by ruff format/check (e.g., line length,
+  import ordering, unused imports detected by F401) — ruff runs pre-commit and blocks merge.
+- **Markdown / skill files** (`.md` files under `plugins/dso/`): skip all sub-criteria below;
+  check only for hard-coded secrets and broken cross-references introduced in the diff.
+
+---
+
 ## Light Checklist (Step 2 scope)
 
 Apply only the following highest-signal checks. Skip all other checks — do not expand scope.
@@ -31,6 +48,26 @@ Apply only the following highest-signal checks. Skip all other checks — do not
 - [ ] Security: user-supplied input used in shell commands, SQL queries, or file paths
   without sanitization
 
+**Bash-specific sub-criteria** (apply only to bash scripts / `.sh` files):
+- [ ] Variables used in arithmetic, conditional `[[ ]]`, or concatenation are quoted
+  (e.g., `[[ "$var" == "x" ]]` not `[[ $var == x ]]`) — unquoted variables with
+  whitespace or glob characters cause silent mis-evaluation; flag as `important`.
+  Note: basic unquoted expansions in simple commands are covered by shellcheck (SC2086) —
+  only flag conditional/arithmetic contexts if shellcheck would not catch them.
+- [ ] `set -euo pipefail` (or equivalent) is present in new scripts introduced by this diff;
+  absence of error-abort guards in scripts that run multi-step operations is `important`.
+- [ ] External command outputs used in conditionals are validated (e.g., command substitutions
+  checked for empty/error before use in comparisons).
+
+**Python-specific sub-criteria** (apply only to `.py` files):
+- [ ] `os.system()` or `os.popen()` calls introduced in this diff — flag as `important`
+  under `correctness`; project convention requires `subprocess.run()` / `subprocess.check_output()`
+  for shell command invocations (safer argument handling, captures exit codes).
+- [ ] `except:` bare except or `except Exception:` that silently swallows errors without
+  logging or re-raising — flag as `important`; ruff does not catch silent swallowing.
+- [ ] User-controlled input passed to `subprocess` without a `shell=False` guard or explicit
+  argument list — flag as `critical` security finding.
+
 ### Testing Coverage (always check)
 - [ ] New code paths (functions, branches) have at least one corresponding test
 - [ ] Error/exception paths exercised in tests
@@ -50,6 +87,21 @@ Apply only the following highest-signal checks. Skip all other checks — do not
 
 ---
 
+## Linter Suppression Rules
+
+Do NOT report findings that are already enforced by the project's automated tooling:
+
+- **ruff** (Python): formatting (E1–E5), import ordering (I), unused imports (F401),
+  and all `ruff check` rules run pre-commit. Do not re-flag these.
+- **shellcheck** (bash): SC2086 (unquoted variables in simple expansions), SC2164
+  (`cd` without error check), SC2006 (backtick command substitution), and most
+  quoting/syntax warnings. Only flag patterns shellcheck misses in context (see
+  Bash-specific sub-criteria above).
+- **mypy** (Python types): type annotation violations run pre-commit. Do not flag
+  missing type annotations or type mismatches unless they indicate a logic bug.
+
+---
+
 ## Scope Limits for Light Tier
 
 - Report only findings you are highly confident about from the diff alone.