Skip to content

Commit 7e592cb

Browse files
Merge remote-tracking branch 'origin/main' into worktree-20260321-161319
# Conflicts: # .tickets/.index.json
2 parents 9dd29c5 + 7b0cb80 commit 7e592cb

File tree

18 files changed

+809
-758
lines changed

18 files changed

+809
-758
lines changed

.tickets/.index.json

Lines changed: 59 additions & 730 deletions
Large diffs are not rendered by default.

.tickets/dso-5ooy.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
id: dso-5ooy
3+
status: open
4+
deps: [w21-ykic, w21-ovpn]
5+
links: []
6+
created: 2026-03-21T23:27:40Z
7+
type: epic
8+
priority: 2
9+
assignee: Joe Oakhart
10+
---
11+
# Conditional Security & Performance Review Overlays
12+
13+
14+
## Notes
15+
16+
<!-- note-id: k3xxzh2r -->
17+
<!-- timestamp: 2026-03-21T23:28:18Z -->
18+
<!-- origin: agent -->
19+
<!-- sync: unsynced -->
20+
21+
22+
## Context
23+
24+
Security and performance concerns cut across all review dimensions but don't belong as permanent sub-criteria in every review — they waste tokens on changes with no security/performance surface. This epic adds conditional review overlays that trigger only when the classifier detects relevant signals, orthogonal to the tier system (any tier can trigger an overlay).
25+
26+
## Brainstorm Research (to be resumed)
27+
28+
### Architecture Decision
29+
- Security and performance reviews are **conditional overlays**, not permanent dimensions
30+
- Triggered by classifier signals alongside tier routing: classifier emits trigger flags (security_review: true, performance_review: true)
31+
- Any tier level can trigger an overlay — a Light tier change touching auth still gets security review
32+
- Each overlay has its own dedicated reviewer agent, checklist, and findings that merge into reviewer-findings.json
33+
34+
### Security Review Triggers (proposed)
35+
- Code that touches external integrations
36+
- Code that touches data layer
37+
- Authentication or authorization code
38+
- Encryption-related code
39+
40+
### Security Review Criteria (from research)
41+
Source: Anthropic claude-code-security-review (OWASP-aligned)
42+
- Injection attacks: SQL, command, LDAP, XPath, NoSQL, XXE
43+
- Authentication & authorization: broken auth, privilege escalation, insecure direct object references, auth bypass, session flaws
44+
- Data exposure: hardcoded secrets, sensitive data logging, information disclosure, PII handling violations
45+
- Cryptographic issues: weak algorithms, improper key management, insecure RNG
46+
- Input validation: missing validation, improper sanitization, buffer overflows
47+
- Business logic flaws: race conditions, TOCTOU (time-of-check-time-of-use)
48+
- Configuration security: insecure defaults, missing security headers, permissive CORS
49+
- Supply chain: vulnerable dependencies, typosquatting
50+
- Code execution: RCE via deserialization, pickle injection, eval injection
51+
- XSS: reflected, stored, DOM-based
52+
- Error message information leakage (OWASP): errors that reveal internal state
53+
54+
### Performance Review Triggers (proposed)
55+
- Any operation more expensive than O(n)
56+
- Code that touches infrastructure
57+
- Code that touches data layer
58+
- Future enhancement: trigger on spike in test runtime or application latency in E2E testing (needs friction-free way to surface this data)
59+
60+
### Performance Review Criteria (from research)
61+
- N+1 query problems
62+
- Nested loops over large datasets
63+
- Inefficient algorithms or database queries
64+
- Memory usage patterns and potential leaks
65+
- Bundle size and optimization opportunities
66+
- Sequential I/O where parallel is possible (AI-specific)
67+
- Image optimization
68+
69+
### Integration Architecture (to be designed)
70+
Pipeline becomes: classifier → tier + overlay triggers → dispatch tier reviewer(s) + overlay reviewer(s) → merged findings → resolution loop
71+
- Overlay reviewers need own agent definitions and checklists
72+
- Findings merge into same reviewer-findings.json and scoring
73+
- Classifier/dispatch changes needed to trigger overlays
74+
- Performance runtime trigger (pytest --durations baseline comparison) deferred as future enhancement to avoid friction
75+
76+
### Open Questions
77+
- Exact classifier signal thresholds for triggering overlays
78+
- Whether overlays should have their own severity scale or use the existing critical/important/minor
79+
- How overlay findings interact with the autonomous resolution loop
80+
- Whether the security overlay replaces or supplements the existing dso-0wi2 sensitive-info security review
81+
82+
## Dependencies
83+
- w21-ykic (Tiered Review Architecture): requires classifier infrastructure to add overlay trigger signals
84+
- w21-ovpn (Review Intelligence & Precision): requires enriched checklist architecture (reviewer-delta files, confidence scoring, false-positive filters)
85+

.tickets/dso-6x8o.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
id: dso-6x8o
3+
status: open
4+
deps: []
5+
links: []
6+
created: 2026-03-22T00:00:28Z
7+
type: bug
8+
priority: 1
9+
assignee: Joe Oakhart
10+
---
11+
# Bug: record-test-status.sh allows re-recording with stale test results — hash updated without re-running tests
12+
13+
14+
## Notes
15+
16+
<!-- note-id: mw0cb2ez -->
17+
<!-- timestamp: 2026-03-22T00:00:47Z -->
18+
<!-- origin: agent -->
19+
<!-- sync: unsynced -->
20+
21+
22+
## Observed Behavior
23+
24+
During the commit workflow, the orchestrator:
25+
1. Ran tests and recorded test-gate-status (hash A)
26+
2. Made code changes during the review resolution loop (code changed, hash now B)
27+
3. Called record-test-status.sh again — it computed the new hash B and wrote 'passed' WITHOUT re-running the tests
28+
4. The commit proceeded with hash B in test-gate-status, but the tests were only validated against hash A
29+
30+
This means record-test-status.sh trusts whatever the current diff hash is and stamps it as 'passed' based on a single test run, even if the code has changed since that run. The test gate checks that the hash matches, but it doesn't verify that the tests were actually run against the current hash.
31+
32+
## Root Cause
33+
34+
record-test-status.sh computes the current diff hash and writes it to test-gate-status with 'passed', but it does not compare the new hash against the hash from the actual test execution. If called after code changes (e.g., review fixes), it re-stamps without re-running.
35+
36+
## Expected Behavior
37+
38+
record-test-status.sh should either:
39+
(a) Always re-run the associated tests before writing 'passed' (current behavior runs tests, but if called a second time after code changes, it should detect the hash mismatch and re-run), OR
40+
(b) Track the hash at the time tests were run and refuse to write a new hash unless tests are re-executed
41+
42+
## Impact
43+
44+
The test gate can be satisfied without tests covering the actual committed code. This undermines the two-layer defense-in-depth: the gate checks hash consistency, but the recording step doesn't enforce test-to-hash correspondence on re-invocation.
45+
46+
## Reproduction
47+
48+
1. Stage files, run record-test-status.sh (records hash A)
49+
2. Edit a staged file (hash changes to B)
50+
3. Run record-test-status.sh again (records hash B with 'passed' — no tests re-run if the associated test files haven't changed)
51+

.tickets/dso-b538.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
id: dso-b538
3+
status: open
4+
deps: [dso-bxng]
5+
links: []
6+
created: 2026-03-21T23:20:14Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Deep Sonnet C reviewer applies deep hygiene, design, and maintainability checks
13+
14+
15+
## Notes
16+
17+
<!-- note-id: qbufm2gh -->
18+
<!-- timestamp: 2026-03-21T23:21:29Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-deep-hygiene-design-maint.md checklist for the Deep tier Sonnet C (hygiene + design + maintainability specialist) reviewer.
26+
27+
**Why**: Deep tier reviews high-complexity changes. Sonnet C owns three structural dimensions — the qualities that prevent long-term codebase decay. No ticket context needed because structural quality is ticket-independent.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-deep-hygiene-design-maint.md includes all Standard hygiene/design/maintainability criteria plus:
32+
- Flag functions where branching depth suggests extraction opportunities
33+
- Evaluate whether new abstractions follow single responsibility
34+
- Flag in-place mutation of shared data structures when immutable patterns are established in surrounding code
35+
- When this story is complete, the checklist includes no ticket context instructions (structural quality is ticket-independent)
36+
- When this story is complete, build-review-agents.sh regenerates the deep hygiene/design/maintainability reviewer agent successfully
37+

.tickets/dso-b7nb.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
id: dso-b7nb
3+
status: open
4+
deps: [dso-bxng]
5+
links: []
6+
created: 2026-03-21T23:20:11Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Deep Sonnet A reviewer applies deep correctness checks with acceptance criteria validation
13+
14+
15+
## Notes
16+
17+
<!-- note-id: 6sefb1v1 -->
18+
<!-- timestamp: 2026-03-21T23:21:18Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-deep-correctness.md checklist for the Deep tier Sonnet A (correctness specialist) reviewer.
26+
27+
**Why**: Deep tier reviews high-complexity changes (classifier score 7+). Sonnet A owns correctness with full ticket context, enabling acceptance criteria validation that other reviewers can't perform.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-deep-correctness.md includes all Standard correctness criteria plus:
32+
- Acceptance criteria validation against ticket (when ticket context available)
33+
- Deeper edge-case analysis with explicit escape hatch: if code handles edge cases adequately, state so — do not manufacture findings
34+
- Inaccurate naming elevated from minor to important severity (name implies different behavior than implementation)
35+
- When this story is complete, the checklist includes ticket context instructions: use full ticket (minus verbose status update notes) when available; do not block on missing ticket context
36+
- When this story is complete, build-review-agents.sh regenerates the deep correctness reviewer agent successfully
37+
38+
## Constraints
39+
- Must reference Standard checklist criteria by inclusion, not duplication — the build process composes base + standard + deep-correctness
40+

.tickets/dso-boct.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
id: dso-boct
3+
status: open
4+
deps: []
5+
links: []
6+
created: 2026-03-21T23:20:00Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Light tier haiku reviewer applies a focused 6-item checklist to low-complexity changes
13+
14+
15+
## Notes
16+
17+
<!-- note-id: t9c2dk6q -->
18+
<!-- timestamp: 2026-03-21T23:20:49Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-light.md checklist for the Light tier haiku reviewer with exactly 6 high-signal items.
26+
27+
**Why**: Light tier reviews low-complexity changes (classifier score 0-2). Haiku has limited context budget — the checklist must focus on what delivers value for small changes without codebase research.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-light.md contains exactly 6 checklist items:
32+
1. Silent failures: swallowed exceptions, empty catch blocks
33+
2. Tolerance/assertion weakening: changes that relax existing validation
34+
3. Test-code correspondence: production change without test change in same diff (binary check)
35+
4. Type system escape hatches: Any/any/interface{} without justifying comment
36+
5. Dead code introduced in the diff: unused imports, unreachable branches
37+
6. Non-descriptive names in the diff: single letters, generic words (data, temp, result, process, handle)
38+
- When this story is complete, the checklist includes no codebase research instructions (no Grep/Read)
39+
- When this story is complete, the checklist includes no similarity pipeline or ticket context references
40+
- When this story is complete, the checklist includes escape hatch language: if no issues found, state so explicitly rather than manufacturing findings
41+
- When this story is complete, build-review-agents.sh regenerates the light reviewer agent successfully
42+
43+
## Constraints
44+
- No codebase research tools — haiku doesn't have the context budget
45+
- Items must be evaluable from the diff alone
46+

.tickets/dso-bxng.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
id: dso-bxng
3+
status: open
4+
deps: []
5+
links: []
6+
created: 2026-03-21T23:20:06Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Standard tier sonnet reviewer applies full 5-dimension checklists with researched sub-criteria
13+
14+
15+
## Notes
16+
17+
<!-- note-id: 2k8hj2b0 -->
18+
<!-- timestamp: 2026-03-21T23:21:05Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-standard.md checklist for the Standard tier sonnet reviewer with full 5-dimension coverage and researched sub-criteria.
26+
27+
**Why**: Standard tier handles ~30-40% of reviews. This is the most common reviewer and needs comprehensive criteria across all dimensions. Condensed ticket context reduces false positives without exhausting context budget.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-standard.md contains sub-criteria for all 5 dimensions:
32+
- Correctness: edge cases/failure states with escape hatch, race conditions in async operations, silent failures, tolerance/assertion weakening, over-engineering/YAGNI
33+
- Verification: behavior-driven not implementation-driven tests, test-code correspondence in same changeset, assertion quality (meaningful vs trivial), arrange-act-assert structure, test smells (naming, fixture bloat)
34+
- Hygiene: type system escape hatches without justification, nesting depth >2 levels (suggest early returns/extraction), dead code, suppression scrutiny (noqa/type:ignore with justifying comments), explicit exclusion of linter-catchable issues
35+
- Design: SOLID adherence, architectural pattern adherence, correct file/folder placement, Rule of Three duplication via similarity pipeline, coupling/dependency direction, reuse of existing utilities
36+
- Maintainability: codebase consistency (local patterns — error handling style, return type patterns, abstraction level — not linter rules), clear and accurate naming (non-descriptive AND inaccurate), comments explain why not what, doc correspondence for public interface changes (minor severity — only when specific existing doc artifact is stale)
37+
- When this story is complete, the checklist includes anti-shortcut distribution: noqa/type:ignore -> hygiene, skipped tests -> verification, tolerances/assertions -> correctness
38+
- When this story is complete, consolidation findings are severity=minor with orchestrator ticket creation
39+
- When this story is complete, the checklist includes ticket context instructions: use condensed summary (title + acceptance criteria) when available; do not block on missing ticket context
40+
- When this story is complete, build-review-agents.sh regenerates the standard reviewer agent successfully
41+
42+
## Research Sources
43+
Google engineering practices, OWASP, test smell literature, 5 Claude Code review plugins (Anthropic official, Claude Command Suite, claude-code-skills, claude-code-showcase, wshobson commands)
44+

.tickets/dso-gfry.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
id: dso-gfry
3+
status: open
4+
deps: [dso-bxng]
5+
links: []
6+
created: 2026-03-21T23:20:12Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Deep Sonnet B reviewer applies deep verification checks evaluating test quality and coverage
13+
14+
15+
## Notes
16+
17+
<!-- note-id: 5t4zz04f -->
18+
<!-- timestamp: 2026-03-21T23:21:24Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-deep-verification.md checklist for the Deep tier Sonnet B (verification specialist) reviewer.
26+
27+
**Why**: Deep tier reviews high-complexity changes. Sonnet B owns verification — evaluating whether the test suite is trustworthy and covers the right behaviors. It does not identify edge cases itself; it evaluates test coverage of edge cases present in the code.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-deep-verification.md includes all Standard verification criteria plus:
32+
- Test as documentation: can someone read the test and understand the intended behavior?
33+
- Test isolation evaluation: are tests independent or do they depend on shared state/execution order?
34+
- When this story is complete, the checklist explicitly states: does not identify edge cases — evaluates whether test suite covers edge cases present in the code
35+
- When this story is complete, the checklist includes no ticket context instructions (verification is code-observable)
36+
- When this story is complete, build-review-agents.sh regenerates the deep verification reviewer agent successfully
37+
38+
## Scope Boundary
39+
- This story owns checklist criteria for how the reviewer evaluates test quality in the diff
40+
- dso-ppwp owns pre-commit test gate enforcement that blocks commits when tests haven't been run
41+

.tickets/dso-jf62.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
id: dso-jf62
3+
status: open
4+
deps: [dso-b7nb, dso-gfry, dso-b538]
5+
links: []
6+
created: 2026-03-21T23:20:18Z
7+
type: story
8+
priority: 2
9+
assignee: Joe Oakhart
10+
parent: w21-ovpn
11+
---
12+
# As a DSO practitioner, the Deep Opus architectural reviewer applies cross-cutting synthesis checks across all specialist findings
13+
14+
15+
## Notes
16+
17+
<!-- note-id: 62cayy1f -->
18+
<!-- timestamp: 2026-03-21T23:21:40Z -->
19+
<!-- origin: agent -->
20+
<!-- sync: unsynced -->
21+
22+
23+
## Description
24+
25+
**What**: Create the reviewer-delta-deep-architectural.md checklist for the Deep tier Opus (architectural synthesis) reviewer.
26+
27+
**Why**: The opus reviewer is the only agent that sees all 3 specialists' findings plus the full diff plus full ticket context. Its job is cross-cutting synthesis — identifying patterns and risks that no single specialist can see.
28+
29+
## Acceptance Criteria
30+
31+
- When this story is complete, reviewer-delta-deep-architectural.md includes these cross-cutting checks:
32+
- Cross-cutting coherence: resolve contradictions between specialist findings
33+
- Untested edge cases: cross-reference Sonnet A edge cases against Sonnet B test coverage findings
34+
- Architectural boundary shifts: logic/validation/data moving between layers
35+
- Pattern divergence: new approach to something the codebase already has a pattern for
36+
- Acceptance criteria completeness: does the change fulfill what the ticket asked for?
37+
- Unrelated scope: flag changes that include modifications unrelated to the stated ticket objective
38+
- Regression awareness: repeated patches to same area suggesting deeper issue (via targeted git blame)
39+
- Root cause vs. symptom: does the fix address the underlying cause or just the visible symptom?
40+
- When this story is complete, the checklist includes instructions for self-directed git history investigation (opus runs targeted git blame/log based on findings — no orchestrator pre-gathering)
41+
- When this story is complete, the checklist includes ticket context instructions: use full ticket (minus verbose status update notes) when available; do not block on missing ticket context
42+
- When this story is complete, build-review-agents.sh regenerates the deep architectural reviewer agent successfully
43+
44+
## Constraints
45+
- This reviewer does not duplicate specialist checks — it synthesizes across them
46+
- Git history investigation is self-directed and targeted, not exhaustive
47+

0 commit comments

Comments
 (0)