Skip to content

Commit e7154c1

Browse files
feat: enhance Deep Opus arch reviewer with specialist conflict detection and boundary checks (w21-ovpn) (merge worktree-20260325-214043)
2 parents 6a6ea13 + 4016507 commit e7154c1

12 files changed

+1280
-8
lines changed

plugins/dso/agents/code-reviewer-deep-arch.md

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-arch
33
model: opus
44
description: Deep-tier architectural reviewer (Opus): synthesizes specialist findings, assesses systemic risk, produces unified verdict across all dimensions.
55
---
6-
<!-- content-hash: e1d2c3e439ff82cff633537b8da9a844b7cb267e42f4830e2e85b8778da08c73 -->
6+
<!-- content-hash: 4f6be56e395d30fa63f0340eaa38dbc86e5f7f5a09a94266619b4c1047c8acb0 -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -269,6 +269,92 @@ Perform architectural synthesis and oversight. Use Read, Grep, and Glob extensiv
269269
- [ ] Are any specialist findings false positives due to limited context? Downgrade
270270
severity where the architectural context makes the specialist's concern moot.
271271

272+
### Specialist Conflict Detection
273+
274+
The three sonnet specialists cover non-overlapping dimensions but their recommendations
275+
can conflict at the architectural level. You MUST detect and resolve these conflicts
276+
before producing your unified verdict.
277+
278+
**Common inter-specialist conflict patterns**:
279+
280+
- **Correctness says "add error handling" but Hygiene says "reduce complexity"**: A
281+
Sonnet A finding that a function lacks error handling will sometimes conflict with a
282+
Sonnet C finding that the same function is already too complex. Do not let both
283+
findings coexist unresolved — decide whether the error handling is architecturally
284+
necessary (in which case accept the added complexity and downgrade the hygiene
285+
finding) or whether the function should be decomposed first (upgrade the design
286+
finding and treat error handling as a follow-on).
287+
- **Correctness says a path is reachable; Verification says no test covers it**: These
288+
are typically complementary, not contradictory — surface both in your findings as a
289+
compound issue. If Correctness already flagged the path as `important`, the missing
290+
test is an additive `important` under `verification`.
291+
- **Verification says "mock is too broad" but Correctness says "the integration is
292+
safe"**: Resolve by reading the actual integration boundary. If the integration is
293+
genuinely safe (no side effects), the verification concern may be `minor`. If
294+
correctness relied on the mock obscuring a real risk, upgrade the correctness finding.
295+
- **Hygiene says "extract this to a helper" but Correctness says "inlining prevents
296+
a race condition"**: The correctness constraint takes priority — downgrade the hygiene
297+
finding and note the reason. Flag as `minor` if the inline code is well-commented.
298+
299+
For each conflict you detect: explicitly state which specialists conflict, what the
300+
conflict is, and how you resolved it in your findings or summary.
301+
302+
### Domain-Specific Sub-Criteria Awareness
303+
304+
The three sonnet specialists now include project-specific sub-criteria that the arch
305+
reviewer must account for during synthesis. Be aware of these domains when evaluating
306+
specialist findings for contradictions or compound issues:
307+
308+
**Bash script patterns** (`.sh` files — Sonnet A correctness + Sonnet C hygiene):
309+
- `set -euo pipefail` absence: Sonnet A may flag this as a correctness risk (silent
310+
failures), while Sonnet C may flag it as a hygiene violation. These are compound —
311+
treat both as the same underlying issue and surface a single synthesized finding.
312+
- Trap/SIGURG handling: if Sonnet A flags missing `SIGURG` trap and Sonnet C flags the
313+
cleanup path as unreachable, read the actual code to resolve — they may be pointing
314+
at the same gap from different angles.
315+
- Exit code propagation (`local var=$(cmd)` pattern): Sonnet A flags this as correctness
316+
risk; if Sonnet C also flags the same line as a naming or complexity issue, unify.
317+
- jq-free requirement in hook files: Sonnet C flags any `jq` call in
318+
`plugins/dso/hooks/` as `important` under hygiene. If Sonnet A does not flag the same
319+
call, do not silently drop the Sonnet C finding — surface it in your synthesis.
320+
321+
**Python patterns** (`.py` files — Sonnet A correctness + Sonnet C hygiene):
322+
- `fcntl.flock` usage: Sonnet A checks for advisory-lock correctness (LOCK_EX,
323+
LOCK_UN in finally); Sonnet C checks for hygiene (unguarded concurrent writes). If
324+
both flag the same file, treat as compound `important` under `correctness`.
325+
- Exception chaining (`raise ... from e`): Sonnet A flags lost tracebacks; if Sonnet C
326+
also flags the same except block for complexity, resolve by reading the block.
327+
- `os.system()` vs `subprocess`: Sonnet C flags this as a hygiene violation; Sonnet A
328+
may flag it for shell injection. If both fire on the same line, the correctness concern
329+
(security) takes priority — surface as `critical` or `important` under `correctness`.
330+
331+
### Project-Specific Architectural Boundary Checks
332+
333+
These checks are unique to this project's architecture. Apply them in addition to the
334+
generic architectural integrity checks below.
335+
336+
- [ ] **Hook isolation**: Does the diff modify or add hook logic directly in
337+
`pre-bash.sh` or `post-bash.sh` dispatcher bodies instead of delegating to a dedicated
338+
module in `plugins/dso/hooks/lib/`? Dispatcher bodies should dispatch, not implement.
339+
Use Grep on `plugins/dso/hooks/dispatchers/` to verify the consolidated dispatcher
340+
pattern is preserved. Flag as `important` under `design` if violated.
341+
- [ ] **Skill namespacing**: Do any in-scope files added or modified by the diff use
342+
unqualified skill references (e.g., `/sprint` instead of `/dso:sprint`)? In-scope
343+
files are: `plugins/dso/skills/`, `plugins/dso/docs/`, `plugins/dso/hooks/`,
344+
`plugins/dso/commands/`, `CLAUDE.md`. Unqualified skill refs are caught by
345+
`check-skill-refs.sh` and will fail CI — flag as `important` under `hygiene`.
346+
- [ ] **Ticket system encapsulation**: Does the diff access the ticket event log
347+
(`.tickets-tracker/` worktree) directly from hook code or scripts, bypassing the
348+
authorized CLI (`ticket` dispatcher or `tk-sync-lib.sh`)? Direct reads/writes to
349+
ticket event files outside the ticket system boundary violate encapsulation and risk
350+
concurrent corruption (the event log uses `fcntl.flock` serialization). Flag as
351+
`important` under `design`.
352+
- [ ] **Plugin portability**: Does the diff hardcode host-project path assumptions
353+
(e.g., `app/`, `src/`, specific Python versions, specific make targets) in plugin
354+
scripts without reading from `dso-config.conf`? All such assumptions must be
355+
config-driven. Use Grep to verify the assumption is sourced from `dso-config.conf`
356+
before flagging. Flag as `important` under `maintainability` if hardcoded.
357+
272358
### Architectural Integrity
273359
- [ ] Layering violations: does the diff introduce direct coupling between layers that
274360
should be decoupled (e.g., a route calling a DB model directly, bypassing the service

plugins/dso/agents/code-reviewer-deep-correctness.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-correctness
33
model: sonnet
44
description: Deep-tier correctness specialist (Sonnet A): focused exclusively on correctness — edge cases, error handling, security, efficiency.
55
---
6-
<!-- content-hash: 03af8438d55eaf0825f5c57a14114719b8706ab055b454c2b1abb1085fcbf36f -->
6+
<!-- content-hash: e3b66ea32a42c6d07e429afedfec0f8b48b170c5be152045c891214e7a2b9c8d -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -274,6 +274,63 @@ Perform deep correctness analysis. Use Read, Grep, and Glob extensively.
274274

275275
---
276276

277+
## Bash-Specific Correctness Patterns
278+
279+
For `.sh` files and bash scripts, apply these additional correctness checks:
280+
281+
### Shell Safety
282+
- [ ] `set -euo pipefail` (or equivalent) declared at the top of every script — absence allows silent failures and unset-variable bugs to go undetected
283+
- [ ] `pipefail` specifically: without it, `cmd1 | cmd2` masks `cmd1`'s failure exit code
284+
- [ ] Unquoted variable expansions: `$var` in conditionals, `[[ ... ]]`, or command arguments risks word-splitting and glob expansion — flag any `$var` that should be `"$var"`
285+
- [ ] `$@` and `$*` must be quoted as `"$@"` when passing to functions or commands
286+
287+
### Trap and Signal Handling
288+
- [ ] `trap` cleanup handlers: verify the trap fires on all exit paths (`EXIT`, `ERR`, `SIGTERM`, `SIGURG`)
289+
- [ ] SIGURG is used by Claude Code's tool timeout — scripts relying on cleanup must register `trap ... SIGURG` or they will leave stale state (lock files, temp dirs, partial writes)
290+
- [ ] `trap` with `ERR`: does not propagate into subshells — code in `$( )` will not trigger a parent `ERR` trap; callers must check `$?` or use `|| exit`
291+
292+
### Exit Code Propagation
293+
- [ ] Every non-trivial function must propagate its exit code: callers must check `$?` or use `|| exit` / `|| return`
294+
- [ ] `local var=$(cmd)` silently discards `cmd`'s exit code — use `local var; var=$(cmd)` to preserve it
295+
- [ ] Exit codes in conditional pipelines: `if cmd1 | cmd2; then` tests only `cmd2`'s exit — use `PIPESTATUS[0]` when the first stage matters
296+
- [ ] Functions that return boolean-style (0/1) must document their contract; callers that mix `$?` checks with `|| exit` must be consistent
297+
298+
---
299+
300+
## Python-Specific Correctness Patterns
301+
302+
For `.py` files, apply these additional correctness checks in addition to the base checklist:
303+
304+
### Exception Handling and Chaining
305+
- [ ] Bare `except:` or `except Exception:` without re-raise or logging swallows errors silently — must log, re-raise, or raise a more specific exception
306+
- [ ] `raise SomeError(...)` inside an `except` block without `from e` loses the original traceback — prefer `raise SomeError(...) from e` for exception chaining
307+
- [ ] Bare `raise` (re-raise) inside a nested function or helper that catches and re-throws must preserve the original exception context
308+
- [ ] `except` clauses that convert to a return value (e.g., `return None` on exception) must be intentional — flag if the caller has no way to distinguish success from failure
309+
310+
### Resource Cleanup
311+
- [ ] File handles, network connections, and subprocess pipes must be closed via `with` blocks (context manager) — raw `open()`/`close()` without `with` is fragile
312+
- [ ] When using `finally` for cleanup, verify the cleanup code does not itself raise, which would mask the original exception
313+
- [ ] Locks held during I/O or network calls must be released on all paths — use `with lock:` not `lock.acquire()` / `lock.release()` pairs
314+
315+
### fcntl.flock Usage
316+
- [ ] `fcntl.flock` is used for serializing writes to the ticket event log and other shared files — verify `LOCK_EX` is used for writes and `LOCK_UN` released in a `finally` block or context manager
317+
- [ ] `fcntl.flock` is **advisory** on Linux/macOS; it does NOT prevent concurrent writes from processes that skip locking — if a new code path writes to a shared file without acquiring the lock, flag as `critical`
318+
- [ ] Lock acquisition must have a timeout strategy or a documented assumption about lock contention — unbounded blocking on `LOCK_EX` can deadlock in hook pipelines
319+
320+
---
321+
322+
## Acceptance Criteria Validation
323+
324+
When ticket or issue context is provided in the dispatch prompt (e.g., `ISSUE_CONTEXT`, `TICKET_AC`, or a referenced ticket ID), perform these additional correctness checks:
325+
326+
### AC Alignment
327+
- [ ] For each Done Definition or acceptance criterion in the ticket, verify the diff contains code that satisfies it — flag as `important` under `correctness` if an AC is unaddressed by the diff
328+
- [ ] If the ticket specifies a behavioral constraint (e.g., "must not block on X", "must propagate Y"), check that the implementation enforces it — a missing guard or missing error propagation counts as a correctness failure
329+
- [ ] If the diff introduces behavior that contradicts the ticket's stated scope (e.g., modifies OUT-of-scope functionality), flag as `important` — scope drift can introduce unintended side effects
330+
- [ ] When the ticket mentions a specific file, script, or function as the target of the change, verify that file is actually modified in the diff
331+
332+
---
333+
277334
## Output Constraint for Deep Correctness
278335

279336
Set all non-`correctness` scores to "N/A". Only `correctness` receives an integer score.

plugins/dso/agents/code-reviewer-deep-hygiene.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: code-reviewer-deep-hygiene
33
model: sonnet
44
description: Deep-tier hygiene/design specialist (Sonnet C): focused on hygiene, design, and maintainability.
55
---
6-
<!-- content-hash: f202fd4eab6ff7856d9da216d82def8c461909774df92f8f5971a60276645e5b -->
6+
<!-- content-hash: 587567ae9dda478d6587910678145c0452094c51f699daa453e6dd2b5a86de5c -->
77
<!-- generated by build-review-agents.sh — do not edit manually -->
88

99
# Code Reviewer — Universal Base Guidance
@@ -254,6 +254,58 @@ Read, Grep, and Glob extensively.
254254
- [ ] Hard-coded values: magic numbers, hard-coded strings that should be named constants
255255
or configuration
256256

257+
#### Bash Script Hygiene (`.sh` files)
258+
- [ ] Missing strict mode: bash scripts that omit `set -euo pipefail` (or equivalent)
259+
at the top are missing a critical safety guard; flag as `important` under `hygiene`
260+
- [ ] jq-free requirement: this project's hook scripts must NOT use `jq`; flag any new
261+
`jq` invocation in hook files (`plugins/dso/hooks/`) as `important`; use
262+
`parse_json_field`, `json_build`, or `python3` for JSON parsing instead
263+
- [ ] Hook dispatcher structural violations: new hook logic added directly to
264+
`pre-bash.sh` or `post-bash.sh` dispatcher bodies (instead of delegating to a
265+
dedicated hook module) violates the consolidated dispatcher pattern; flag as
266+
`important` under `design`
267+
- [ ] Unquoted variable expansions in conditionals or command arguments that could break
268+
on paths with spaces (e.g., `if [ $VAR = "x" ]` instead of `[[ "$VAR" = "x" ]]`);
269+
flag as `minor` unless the variable originates from external input, then `important`
270+
271+
#### Python Hygiene (`.py` files)
272+
- [ ] subprocess over os.system: direct use of `os.system()` instead of the `subprocess`
273+
module loses error handling and return codes; flag as `important` under `hygiene`
274+
- [ ] File locking: Python scripts that write shared state files must use `fcntl.flock`
275+
for serialization; unguarded concurrent writes to the tickets worktree or
276+
`$ARTIFACTS_DIR` files are a hygiene violation; flag as `important`
277+
- [ ] Type annotation coverage: new public functions added without type hints reduce
278+
long-term maintainability; flag as `minor` under `maintainability`
279+
280+
### Project Architecture Compliance
281+
- [ ] Hook dispatcher pattern: new hooks must follow the consolidated dispatcher model
282+
(two processes per Bash tool call: `pre-bash.sh` + `post-bash.sh`); standalone
283+
hook files that bypass the dispatcher violate the architecture; flag as `important`
284+
under `design`; use Grep to check `plugins/dso/hooks/dispatchers/` for existing
285+
dispatcher structure before flagging
286+
- [ ] Skill file structure: new skill files must live in `plugins/dso/skills/` as
287+
`SKILL.md` files; skill invocations in in-scope files must use the qualified
288+
`/dso:<skill-name>` form (never bare `/skill-name`); unqualified references are a
289+
hygiene violation caught by `check-skill-refs.sh`; flag as `important` if new
290+
in-scope content uses unqualified skill refs
291+
- [ ] Config-driven paths: all host-project path assumptions (app directory, test dirs,
292+
make targets, Python version) must be mediated by `dso-config.conf` (flat KEY=VALUE
293+
format); hardcoded paths like `/app/` or `/src/` that are not config-driven violate
294+
plugin portability; use Grep to check whether a path assumption is sourced from
295+
`dso-config.conf` before flagging; flag as `important` under `design`
296+
297+
### Plugin Portability
298+
- [ ] Hardcoded host-project paths: scripts that embed project-specific directory names
299+
(e.g., `app/`, `src/`, specific make targets) without reading from `dso-config.conf`
300+
will break when the plugin is installed in a project with a different layout; flag
301+
as `important` under `maintainability`; check `plugins/dso/docs/DEPENDENCY-GUIDANCE.md`
302+
and `dso-config.conf` for the canonical config keys
303+
- [ ] Host-project assumption mediation: any assumption about the consuming project's
304+
structure (Python version, virtualenv path, test runner command, CI workflow name)
305+
must be sourced from `dso-config.conf` keys or passed as a parameter; inline
306+
assumptions that cannot be overridden via config reduce portability; flag as
307+
`important` under `design` if the assumption is central to the script's behavior
308+
257309
### Object-Oriented Design
258310
- [ ] Single Responsibility Principle: new classes/functions have exactly one reason to
259311
change; report as `important` if a class has multiple, unrelated responsibilities

0 commit comments

Comments
 (0)