Overstory Agent Definition Refinements
Target repo: https://github.com/jayminwest/overstory
Issue type: Enhancement
Affects: src/agent-defs/ — all agent definition templates
Summary
After running 35+ agent batches through overstory on a large polyglot monorepo (Python + TypeScript + React, ~600 agents spawned), we identified 10 cross-cutting issues in the default agent definitions that cause workflow failures, knowledge loss, and silent protocol violations. This document proposes specific, generalized fixes — no project-specific or runtime-specific opinions.
P0: Reviewer agent has no identity — it's a copy of scout.md
Problem
reviewer.md is nearly identical to scout.md (lines 1-59 are verbatim copies). It references scout-only capabilities (ov spec write is mentioned as "scout only" in a reviewer file), uses the scout's completion protocol ("Verify you have answered the research question"), and has no structured output contract.
The lead's merge pipeline (lead.md:274) parses "PASS" or "FAIL" from the reviewer's mail subject, but nothing in reviewer.md mandates this format. Reviewers can send ambiguous results that the lead can't parse.
Proposed Fix
Give reviewer.md its own identity with:
1. Reviewer-specific failure modes:
- **RUBBER_STAMP** -- Issuing a PASS verdict without running quality gates, reading the diff, or checking spec compliance. Every PASS must be backed by evidence.
- **SCOPE_EXPANSION** -- Reviewing things beyond the spec or suggesting unrelated refactors. The reviewer's job is spec compliance, not unsolicited improvements.
- **AMBIGUOUS_VERDICT** -- Sending a result mail without clearly stating PASS or FAIL in the subject line. The lead's merge pipeline depends on parsing the verdict from the subject.
- **MISSING_MULCH_RECORD** -- Closing without recording review insights. Reviews surface reusable patterns that benefit future agents.
2. Structured verdict format:
## verdict-format
### PASS verdict
Subject: `Review: <task-id> - PASS`
Body includes: spec compliance checklist, quality gate results, suggested mulch records
### FAIL verdict
Subject: `Review: <task-id> - FAIL: <1-line summary>`
Body includes: issues with severity (HIGH/MEDIUM/LOW), file:line references, specific fix instructions, failing spec criteria
3. Verdict criteria:
- PASS if: all spec acceptance criteria met AND quality gates pass AND no HIGH-severity issues
- FAIL if: any spec criterion unmet OR any quality gate fails OR any HIGH-severity issue found
4. Remove scout references:
- Delete
ov spec write mention (reviewer doesn't write specs)
- Replace exploration-oriented completion protocol with verdict-oriented protocol
P0: Builder closes issue before lead verifies — timing bug
Problem
In builder.md, the completion protocol is:
- Send
worker_done mail (step 6)
- Close the issue (step 7)
- Exit immediately (step 8)
But the lead's workflow (lead.md:248-288) is:
- Receive
worker_done
- Decide whether to self-verify or spawn reviewer
- If FAIL: send revision request to builder
The builder has already closed and exited before the lead can request revisions. The lead must reopen the issue and either fix it themselves or spawn a new builder.
Proposed Fix
Add a failure mode and modify the completion protocol:
# New failure mode:
- **PREMATURE_ISSUE_CLOSE** -- Closing your issue before receiving acknowledgment from your parent lead. After sending `worker_done`, wait for either acceptance or revision feedback before closing.
# Modified completion steps:
6. Send `worker_done` mail to parent.
7. Check mail once more for revision requests:
- If revision request received: address feedback, re-run quality gates, send another `worker_done`.
- If no revision request (or explicit acceptance): proceed to close.
8. Close the issue.
9. Exit.
P1: Merger doesn't preserve .mulch/ during conflict resolution
Problem
The merger's tiered conflict resolution (Tier 1-4) focuses on source code. .mulch/expertise/*.jsonl files are append-only JSONL — when both branches add records to the same file, the correct resolution is concatenation, not choosing one side. But the merger has no guidance on this, and Tier 3 (AI-Resolve) or Tier 4 (Reimagine) can silently discard mulch records.
Proposed Fix
Add failure mode and preservation rule:
# New failure mode:
- **MULCH_DATA_LOSS** -- Discarding `.mulch/` directory changes during conflict resolution. `.mulch/expertise/*.jsonl` files are append-only JSONL — if both sides added records, concatenate them. Never resolve a `.mulch/` conflict by choosing one side.
# Add before the tier descriptions:
**`.mulch/` preservation rule (applies to ALL tiers):** `.mulch/expertise/*.jsonl` files are append-only. If both branches added records to the same file, concatenate both sets of records. After resolving `.mulch/` conflicts, verify no records were lost.
P1: Merger sends wrong mail type — coordinator can't route it
Problem
The coordinator (coordinator.md:199-207) expects to receive merged and merge_failed typed messages. But the merger (merger.md:141-146) sends generic --type result. The coordinator's mail routing logic may not handle merger completions correctly.
Proposed Fix
Change merger's report mail to use typed messages:
# On success:
ov mail send --to <parent-or-coordinator> \
--subject "Merged: <branch>" \
--body "Tier: <tier-used>. Conflicts: <list or none>. Tests: passing." \
--type merged
# On failure:
ov mail send --to <parent-or-coordinator> \
--subject "Merge failed: <branch>" \
--body "Tier reached: <tier>. Conflict files: <list>. Error: <description>." \
--type merge_failed --priority high
P1: Lead has no guidance for mixed-runtime workers
Problem
Overstory supports 11 runtimes, and config.runtime.capabilities allows per-role routing. A lead might spawn a scout on one runtime and a builder on another. But the lead's three-phase workflow assumes all workers participate in the mail protocol, which is only true for Claude/Sapling/Pi.
Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.) receive instructions via AGENTS.md but cannot send worker_done, result, question, or error messages. The lead's "collect scout results" and "wait for worker_done" steps fail silently.
Proposed Fix
Add a mixed-runtime-awareness section to lead.md:
## mixed-runtime-awareness
Workers may run on different runtimes depending on `config.runtime.capabilities`.
### Mail-capable runtimes (Claude, Sapling, Pi)
Use the standard three-phase workflow with mail-based coordination.
### Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.)
These receive instructions via AGENTS.md but cannot send or receive mail:
1. Do not wait for `worker_done` mail. Monitor worktree for completion:
- Check `ov status` for agent state changes
- Check worktree git log for commits
2. Do not send questions or revision requests via mail — the worker can't read them.
3. Self-verify output by reading the worktree diff directly.
4. Expect faster completion but no mid-task communication.
5. If a non-mail worker appears stalled (no new commits), it has likely failed silently. Replace rather than nudge.
P1: Scout has no structured output contract
Problem
The scout sends findings via ov spec write and a short result mail, but there's no defined schema for what the lead expects to receive. Leads must extract structure from free-form prose, leading to inconsistent spec quality.
Proposed Fix
Add a report-format section to scout.md:
## report-format
Structure spec/report output with these sections:
# Scout Report: <task-id>
## Files Found
- <path> — <purpose/role>
## Patterns Observed
- <pattern name>: <description, with file:line references>
## Types and Interfaces
- <type/interface name> at <file:line> — <what it defines, who consumes it>
## Dependencies
- <component A> depends on <component B> via <mechanism>
## Risks and Gotchas
- <risk description> — <why it matters for the task>
## Suggested Mulch Records
- domain: <domain>, type: <convention|pattern>, classification: <foundational|tactical|observational>
description: "<insight worth preserving>"
This gives leads a predictable structure to extract builder specs from.
P2: Mulch protocol inconsistencies across agents
Problem
Mulch recording is inconsistent across agent types:
| Agent |
Records mulch? |
Required? |
| Scout |
No (suggests for parent) |
No |
| Builder |
Yes |
Yes (failure mode) |
| Reviewer |
No |
No |
| Lead |
Yes |
Yes (failure mode) |
| Merger |
Yes (Tier 2+) |
Yes (failure mode) |
| Monitor |
Yes (capability) |
No explicit requirement |
Reviewers never record insights, despite finding reusable patterns (common bug types, convention violations) across multiple reviews. This is a knowledge loss.
Proposed Fix
Add MISSING_MULCH_RECORD failure mode to reviewer.md and require mulch recording in the reviewer's completion protocol.
P2: Monitor doesn't check protocol compliance
Problem
The monitor's anomaly detection (monitor.md:181-188) only checks liveness (stalls, zombie agents, cascade failures). It doesn't detect protocol violations:
- Lead spawning builders without prior scout activity
- Builder completing without
.mulch/ changes
- Coordinator merging non-lead branches
Proposed Fix
Add protocol-compliance anomalies to monitor.md:
## protocol-compliance-checks
In addition to liveness anomalies, watch for protocol violations:
- **SCOUT_BYPASS_DETECTED**: Lead spawned builders without any prior scout agent on the same task. Check agent spawn history via `ov status`.
- **MULCH_MISSING**: Agent completed and their worktree branch has no `.mulch/` changes. Check via `git diff main..<agent-branch> -- .mulch/`.
- **DIRECT_MERGE_ATTEMPT**: Coordinator merged a branch not matching `overstory/lead-*`. Flag to coordinator as a BUILDER_DIRECT_MERGE violation.
Summary of All Proposed Changes
| File |
Priority |
Change |
reviewer.md |
P0 |
Full rewrite — own identity, PASS/FAIL contract, reviewer-specific failure modes, mulch recording |
builder.md |
P0 |
Add PREMATURE_ISSUE_CLOSE failure mode, wait for lead acknowledgment before closing |
lead.md |
P1 |
Add mixed-runtime-awareness section for non-mail runtimes |
merger.md |
P1 |
Add MULCH_DATA_LOSS failure mode, .mulch/ preservation rule, fix mail type to merged/merge_failed |
scout.md |
P1 |
Add report-format section with structured output template |
reviewer.md |
P2 |
Add MISSING_MULCH_RECORD (included in P0 rewrite) |
monitor.md |
P2 |
Add protocol-compliance-checks section |
Context
These findings come from running overstory on a ~600-agent campaign (35 batches) on a polyglot monorepo with mixed runtimes. The issues are not runtime-specific or project-specific — they're structural gaps in how agent roles interrelate. The P0 issues (reviewer identity, builder timing) caused real workflow failures. The P1 issues (mulch loss, mail type mismatches, mixed-runtime blindness) caused silent knowledge loss and coordination breakdowns that only surfaced during post-batch analysis.
Overstory Agent Definition Refinements
Target repo: https://github.com/jayminwest/overstory
Issue type: Enhancement
Affects:
src/agent-defs/— all agent definition templatesSummary
After running 35+ agent batches through overstory on a large polyglot monorepo (Python + TypeScript + React, ~600 agents spawned), we identified 10 cross-cutting issues in the default agent definitions that cause workflow failures, knowledge loss, and silent protocol violations. This document proposes specific, generalized fixes — no project-specific or runtime-specific opinions.
P0: Reviewer agent has no identity — it's a copy of scout.md
Problem
reviewer.mdis nearly identical toscout.md(lines 1-59 are verbatim copies). It references scout-only capabilities (ov spec writeis mentioned as "scout only" in a reviewer file), uses the scout's completion protocol ("Verify you have answered the research question"), and has no structured output contract.The lead's merge pipeline (
lead.md:274) parses "PASS" or "FAIL" from the reviewer's mail subject, but nothing inreviewer.mdmandates this format. Reviewers can send ambiguous results that the lead can't parse.Proposed Fix
Give reviewer.md its own identity with:
1. Reviewer-specific failure modes:
2. Structured verdict format:
3. Verdict criteria:
4. Remove scout references:
ov spec writemention (reviewer doesn't write specs)P0: Builder closes issue before lead verifies — timing bug
Problem
In
builder.md, the completion protocol is:worker_donemail (step 6)But the lead's workflow (
lead.md:248-288) is:worker_doneThe builder has already closed and exited before the lead can request revisions. The lead must reopen the issue and either fix it themselves or spawn a new builder.
Proposed Fix
Add a failure mode and modify the completion protocol:
P1: Merger doesn't preserve
.mulch/during conflict resolutionProblem
The merger's tiered conflict resolution (Tier 1-4) focuses on source code.
.mulch/expertise/*.jsonlfiles are append-only JSONL — when both branches add records to the same file, the correct resolution is concatenation, not choosing one side. But the merger has no guidance on this, and Tier 3 (AI-Resolve) or Tier 4 (Reimagine) can silently discard mulch records.Proposed Fix
Add failure mode and preservation rule:
P1: Merger sends wrong mail type — coordinator can't route it
Problem
The coordinator (
coordinator.md:199-207) expects to receivemergedandmerge_failedtyped messages. But the merger (merger.md:141-146) sends generic--type result. The coordinator's mail routing logic may not handle merger completions correctly.Proposed Fix
Change merger's report mail to use typed messages:
P1: Lead has no guidance for mixed-runtime workers
Problem
Overstory supports 11 runtimes, and
config.runtime.capabilitiesallows per-role routing. A lead might spawn a scout on one runtime and a builder on another. But the lead's three-phase workflow assumes all workers participate in the mail protocol, which is only true for Claude/Sapling/Pi.Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.) receive instructions via
AGENTS.mdbut cannot sendworker_done,result,question, orerrormessages. The lead's "collect scout results" and "wait forworker_done" steps fail silently.Proposed Fix
Add a
mixed-runtime-awarenesssection tolead.md:P1: Scout has no structured output contract
Problem
The scout sends findings via
ov spec writeand a short result mail, but there's no defined schema for what the lead expects to receive. Leads must extract structure from free-form prose, leading to inconsistent spec quality.Proposed Fix
Add a
report-formatsection toscout.md:This gives leads a predictable structure to extract builder specs from.
P2: Mulch protocol inconsistencies across agents
Problem
Mulch recording is inconsistent across agent types:
Reviewers never record insights, despite finding reusable patterns (common bug types, convention violations) across multiple reviews. This is a knowledge loss.
Proposed Fix
Add
MISSING_MULCH_RECORDfailure mode toreviewer.mdand require mulch recording in the reviewer's completion protocol.P2: Monitor doesn't check protocol compliance
Problem
The monitor's anomaly detection (
monitor.md:181-188) only checks liveness (stalls, zombie agents, cascade failures). It doesn't detect protocol violations:.mulch/changesProposed Fix
Add protocol-compliance anomalies to
monitor.md:Summary of All Proposed Changes
reviewer.mdbuilder.mdlead.mdmerger.md.mulch/preservation rule, fix mail type tomerged/merge_failedscout.mdreviewer.mdmonitor.mdContext
These findings come from running overstory on a ~600-agent campaign (35 batches) on a polyglot monorepo with mixed runtimes. The issues are not runtime-specific or project-specific — they're structural gaps in how agent roles interrelate. The P0 issues (reviewer identity, builder timing) caused real workflow failures. The P1 issues (mulch loss, mail type mismatches, mixed-runtime blindness) caused silent knowledge loss and coordination breakdowns that only surfaced during post-batch analysis.