Agent definition refinements: reviewer identity, builder timing, merger mulch, mixed-runtime awareness

# Overstory Agent Definition Refinements

**Target repo:** https://github.com/jayminwest/overstory
**Issue type:** Enhancement
**Affects:** `src/agent-defs/` — all agent definition templates

---

## Summary

After running 35+ agent batches through overstory on a large polyglot monorepo (Python + TypeScript + React, ~600 agents spawned), we identified 10 cross-cutting issues in the default agent definitions that cause workflow failures, knowledge loss, and silent protocol violations. This document proposes specific, generalized fixes — no project-specific or runtime-specific opinions.

---

## P0: Reviewer agent has no identity — it's a copy of scout.md

### Problem
`reviewer.md` is nearly identical to `scout.md` (lines 1-59 are verbatim copies). It references scout-only capabilities (`ov spec write` is mentioned as "scout only" in a reviewer file), uses the scout's completion protocol ("Verify you have answered the research question"), and has no structured output contract.

The lead's merge pipeline (`lead.md:274`) parses "PASS" or "FAIL" from the reviewer's mail subject, but nothing in `reviewer.md` mandates this format. Reviewers can send ambiguous results that the lead can't parse.

### Proposed Fix

Give reviewer.md its own identity with:

**1. Reviewer-specific failure modes:**
```markdown
- **RUBBER_STAMP** -- Issuing a PASS verdict without running quality gates, reading the diff, or checking spec compliance. Every PASS must be backed by evidence.
- **SCOPE_EXPANSION** -- Reviewing things beyond the spec or suggesting unrelated refactors. The reviewer's job is spec compliance, not unsolicited improvements.
- **AMBIGUOUS_VERDICT** -- Sending a result mail without clearly stating PASS or FAIL in the subject line. The lead's merge pipeline depends on parsing the verdict from the subject.
- **MISSING_MULCH_RECORD** -- Closing without recording review insights. Reviews surface reusable patterns that benefit future agents.
```

**2. Structured verdict format:**
```markdown
## verdict-format

### PASS verdict
Subject: `Review: <task-id> - PASS`
Body includes: spec compliance checklist, quality gate results, suggested mulch records

### FAIL verdict
Subject: `Review: <task-id> - FAIL: <1-line summary>`
Body includes: issues with severity (HIGH/MEDIUM/LOW), file:line references, specific fix instructions, failing spec criteria
```

**3. Verdict criteria:**
- PASS if: all spec acceptance criteria met AND quality gates pass AND no HIGH-severity issues
- FAIL if: any spec criterion unmet OR any quality gate fails OR any HIGH-severity issue found

**4. Remove scout references:**
- Delete `ov spec write` mention (reviewer doesn't write specs)
- Replace exploration-oriented completion protocol with verdict-oriented protocol

---

## P0: Builder closes issue before lead verifies — timing bug

### Problem
In `builder.md`, the completion protocol is:
1. Send `worker_done` mail (step 6)
2. Close the issue (step 7)
3. Exit immediately (step 8)

But the lead's workflow (`lead.md:248-288`) is:
1. Receive `worker_done`
2. Decide whether to self-verify or spawn reviewer
3. If FAIL: send revision request to builder

The builder has already closed and exited before the lead can request revisions. The lead must reopen the issue and either fix it themselves or spawn a new builder.

### Proposed Fix

Add a failure mode and modify the completion protocol:

```markdown
# New failure mode:
- **PREMATURE_ISSUE_CLOSE** -- Closing your issue before receiving acknowledgment from your parent lead. After sending `worker_done`, wait for either acceptance or revision feedback before closing.

# Modified completion steps:
6. Send `worker_done` mail to parent.
7. Check mail once more for revision requests:
   - If revision request received: address feedback, re-run quality gates, send another `worker_done`.
   - If no revision request (or explicit acceptance): proceed to close.
8. Close the issue.
9. Exit.
```

---

## P1: Merger doesn't preserve `.mulch/` during conflict resolution

### Problem
The merger's tiered conflict resolution (Tier 1-4) focuses on source code. `.mulch/expertise/*.jsonl` files are append-only JSONL — when both branches add records to the same file, the correct resolution is concatenation, not choosing one side. But the merger has no guidance on this, and Tier 3 (AI-Resolve) or Tier 4 (Reimagine) can silently discard mulch records.

### Proposed Fix

Add failure mode and preservation rule:

```markdown
# New failure mode:
- **MULCH_DATA_LOSS** -- Discarding `.mulch/` directory changes during conflict resolution. `.mulch/expertise/*.jsonl` files are append-only JSONL — if both sides added records, concatenate them. Never resolve a `.mulch/` conflict by choosing one side.

# Add before the tier descriptions:
**`.mulch/` preservation rule (applies to ALL tiers):** `.mulch/expertise/*.jsonl` files are append-only. If both branches added records to the same file, concatenate both sets of records. After resolving `.mulch/` conflicts, verify no records were lost.
```

---

## P1: Merger sends wrong mail type — coordinator can't route it

### Problem
The coordinator (`coordinator.md:199-207`) expects to receive `merged` and `merge_failed` typed messages. But the merger (`merger.md:141-146`) sends generic `--type result`. The coordinator's mail routing logic may not handle merger completions correctly.

### Proposed Fix

Change merger's report mail to use typed messages:

```markdown
# On success:
ov mail send --to <parent-or-coordinator> \
  --subject "Merged: <branch>" \
  --body "Tier: <tier-used>. Conflicts: <list or none>. Tests: passing." \
  --type merged

# On failure:
ov mail send --to <parent-or-coordinator> \
  --subject "Merge failed: <branch>" \
  --body "Tier reached: <tier>. Conflict files: <list>. Error: <description>." \
  --type merge_failed --priority high
```

---

## P1: Lead has no guidance for mixed-runtime workers

### Problem
Overstory supports 11 runtimes, and `config.runtime.capabilities` allows per-role routing. A lead might spawn a scout on one runtime and a builder on another. But the lead's three-phase workflow assumes all workers participate in the mail protocol, which is only true for Claude/Sapling/Pi.

Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.) receive instructions via `AGENTS.md` but cannot send `worker_done`, `result`, `question`, or `error` messages. The lead's "collect scout results" and "wait for `worker_done`" steps fail silently.

### Proposed Fix

Add a `mixed-runtime-awareness` section to `lead.md`:

```markdown
## mixed-runtime-awareness

Workers may run on different runtimes depending on `config.runtime.capabilities`.

### Mail-capable runtimes (Claude, Sapling, Pi)
Use the standard three-phase workflow with mail-based coordination.

### Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.)
These receive instructions via AGENTS.md but cannot send or receive mail:

1. Do not wait for `worker_done` mail. Monitor worktree for completion:
   - Check `ov status` for agent state changes
   - Check worktree git log for commits
2. Do not send questions or revision requests via mail — the worker can't read them.
3. Self-verify output by reading the worktree diff directly.
4. Expect faster completion but no mid-task communication.
5. If a non-mail worker appears stalled (no new commits), it has likely failed silently. Replace rather than nudge.
```

---

## P1: Scout has no structured output contract

### Problem
The scout sends findings via `ov spec write` and a short result mail, but there's no defined schema for what the lead expects to receive. Leads must extract structure from free-form prose, leading to inconsistent spec quality.

### Proposed Fix

Add a `report-format` section to `scout.md`:

```markdown
## report-format

Structure spec/report output with these sections:

# Scout Report: <task-id>

## Files Found
- <path> — <purpose/role>

## Patterns Observed
- <pattern name>: <description, with file:line references>

## Types and Interfaces
- <type/interface name> at <file:line> — <what it defines, who consumes it>

## Dependencies
- <component A> depends on <component B> via <mechanism>

## Risks and Gotchas
- <risk description> — <why it matters for the task>

## Suggested Mulch Records
- domain: <domain>, type: <convention|pattern>, classification: <foundational|tactical|observational>
  description: "<insight worth preserving>"
```

This gives leads a predictable structure to extract builder specs from.

---

## P2: Mulch protocol inconsistencies across agents

### Problem
Mulch recording is inconsistent across agent types:

| Agent | Records mulch? | Required? |
|-------|---------------|-----------|
| Scout | No (suggests for parent) | No |
| Builder | Yes | Yes (failure mode) |
| **Reviewer** | **No** | **No** |
| Lead | Yes | Yes (failure mode) |
| Merger | Yes (Tier 2+) | Yes (failure mode) |
| Monitor | Yes (capability) | No explicit requirement |

Reviewers never record insights, despite finding reusable patterns (common bug types, convention violations) across multiple reviews. This is a knowledge loss.

### Proposed Fix
Add `MISSING_MULCH_RECORD` failure mode to `reviewer.md` and require mulch recording in the reviewer's completion protocol.

---

## P2: Monitor doesn't check protocol compliance

### Problem
The monitor's anomaly detection (`monitor.md:181-188`) only checks liveness (stalls, zombie agents, cascade failures). It doesn't detect protocol violations:
- Lead spawning builders without prior scout activity
- Builder completing without `.mulch/` changes
- Coordinator merging non-lead branches

### Proposed Fix
Add protocol-compliance anomalies to `monitor.md`:

```markdown
## protocol-compliance-checks

In addition to liveness anomalies, watch for protocol violations:

- **SCOUT_BYPASS_DETECTED**: Lead spawned builders without any prior scout agent on the same task. Check agent spawn history via `ov status`.
- **MULCH_MISSING**: Agent completed and their worktree branch has no `.mulch/` changes. Check via `git diff main..<agent-branch> -- .mulch/`.
- **DIRECT_MERGE_ATTEMPT**: Coordinator merged a branch not matching `overstory/lead-*`. Flag to coordinator as a BUILDER_DIRECT_MERGE violation.
```

---

## Summary of All Proposed Changes

| File | Priority | Change |
|------|----------|--------|
| `reviewer.md` | P0 | Full rewrite — own identity, PASS/FAIL contract, reviewer-specific failure modes, mulch recording |
| `builder.md` | P0 | Add PREMATURE_ISSUE_CLOSE failure mode, wait for lead acknowledgment before closing |
| `lead.md` | P1 | Add mixed-runtime-awareness section for non-mail runtimes |
| `merger.md` | P1 | Add MULCH_DATA_LOSS failure mode, `.mulch/` preservation rule, fix mail type to `merged`/`merge_failed` |
| `scout.md` | P1 | Add report-format section with structured output template |
| `reviewer.md` | P2 | Add MISSING_MULCH_RECORD (included in P0 rewrite) |
| `monitor.md` | P2 | Add protocol-compliance-checks section |

---

## Context

These findings come from running overstory on a ~600-agent campaign (35 batches) on a polyglot monorepo with mixed runtimes. The issues are not runtime-specific or project-specific — they're structural gaps in how agent roles interrelate. The P0 issues (reviewer identity, builder timing) caused real workflow failures. The P1 issues (mulch loss, mail type mismatches, mixed-runtime blindness) caused silent knowledge loss and coordination breakdowns that only surfaced during post-batch analysis.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent definition refinements: reviewer identity, builder timing, merger mulch, mixed-runtime awareness #134

Overstory Agent Definition Refinements

Summary

P0: Reviewer agent has no identity — it's a copy of scout.md

Problem

Proposed Fix

P0: Builder closes issue before lead verifies — timing bug

Problem

Proposed Fix

P1: Merger doesn't preserve `.mulch/` during conflict resolution

Problem

Proposed Fix

P1: Merger sends wrong mail type — coordinator can't route it

Problem

Proposed Fix

P1: Lead has no guidance for mixed-runtime workers

Problem

Proposed Fix

P1: Scout has no structured output contract

Problem

Proposed Fix

P2: Mulch protocol inconsistencies across agents

Problem

Proposed Fix

P2: Monitor doesn't check protocol compliance

Problem

Proposed Fix

Summary of All Proposed Changes

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent	Records mulch?	Required?
Scout	No (suggests for parent)	No
Builder	Yes	Yes (failure mode)
Reviewer	No	No
Lead	Yes	Yes (failure mode)
Merger	Yes (Tier 2+)	Yes (failure mode)
Monitor	Yes (capability)	No explicit requirement

File	Priority	Change
`reviewer.md`	P0	Full rewrite — own identity, PASS/FAIL contract, reviewer-specific failure modes, mulch recording
`builder.md`	P0	Add PREMATURE_ISSUE_CLOSE failure mode, wait for lead acknowledgment before closing
`lead.md`	P1	Add mixed-runtime-awareness section for non-mail runtimes
`merger.md`	P1	Add MULCH_DATA_LOSS failure mode, `.mulch/` preservation rule, fix mail type to `merged`/`merge_failed`
`scout.md`	P1	Add report-format section with structured output template
`reviewer.md`	P2	Add MISSING_MULCH_RECORD (included in P0 rewrite)
`monitor.md`	P2	Add protocol-compliance-checks section

Uh oh!

Agent definition refinements: reviewer identity, builder timing, merger mulch, mixed-runtime awareness #134

Description

Overstory Agent Definition Refinements

Summary

P0: Reviewer agent has no identity — it's a copy of scout.md

Problem

Proposed Fix

P0: Builder closes issue before lead verifies — timing bug

Problem

Proposed Fix

P1: Merger doesn't preserve .mulch/ during conflict resolution

Problem

Proposed Fix

P1: Merger sends wrong mail type — coordinator can't route it

Problem

Proposed Fix

P1: Lead has no guidance for mixed-runtime workers

Problem

Proposed Fix

P1: Scout has no structured output contract

Problem

Proposed Fix

P2: Mulch protocol inconsistencies across agents

Problem

Proposed Fix

P2: Monitor doesn't check protocol compliance

Problem

Proposed Fix

Summary of All Proposed Changes

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

P1: Merger doesn't preserve `.mulch/` during conflict resolution