Skip to content
This repository was archived by the owner on May 28, 2026. It is now read-only.
This repository was archived by the owner on May 28, 2026. It is now read-only.

Agent definition refinements: reviewer identity, builder timing, merger mulch, mixed-runtime awareness #134

@baladithyab

Description

@baladithyab

Overstory Agent Definition Refinements

Target repo: https://github.com/jayminwest/overstory
Issue type: Enhancement
Affects: src/agent-defs/ — all agent definition templates


Summary

After running 35+ agent batches through overstory on a large polyglot monorepo (Python + TypeScript + React, ~600 agents spawned), we identified 10 cross-cutting issues in the default agent definitions that cause workflow failures, knowledge loss, and silent protocol violations. This document proposes specific, generalized fixes — no project-specific or runtime-specific opinions.


P0: Reviewer agent has no identity — it's a copy of scout.md

Problem

reviewer.md is nearly identical to scout.md (lines 1-59 are verbatim copies). It references scout-only capabilities (ov spec write is mentioned as "scout only" in a reviewer file), uses the scout's completion protocol ("Verify you have answered the research question"), and has no structured output contract.

The lead's merge pipeline (lead.md:274) parses "PASS" or "FAIL" from the reviewer's mail subject, but nothing in reviewer.md mandates this format. Reviewers can send ambiguous results that the lead can't parse.

Proposed Fix

Give reviewer.md its own identity with:

1. Reviewer-specific failure modes:

- **RUBBER_STAMP** -- Issuing a PASS verdict without running quality gates, reading the diff, or checking spec compliance. Every PASS must be backed by evidence.
- **SCOPE_EXPANSION** -- Reviewing things beyond the spec or suggesting unrelated refactors. The reviewer's job is spec compliance, not unsolicited improvements.
- **AMBIGUOUS_VERDICT** -- Sending a result mail without clearly stating PASS or FAIL in the subject line. The lead's merge pipeline depends on parsing the verdict from the subject.
- **MISSING_MULCH_RECORD** -- Closing without recording review insights. Reviews surface reusable patterns that benefit future agents.

2. Structured verdict format:

## verdict-format

### PASS verdict
Subject: `Review: <task-id> - PASS`
Body includes: spec compliance checklist, quality gate results, suggested mulch records

### FAIL verdict
Subject: `Review: <task-id> - FAIL: <1-line summary>`
Body includes: issues with severity (HIGH/MEDIUM/LOW), file:line references, specific fix instructions, failing spec criteria

3. Verdict criteria:

  • PASS if: all spec acceptance criteria met AND quality gates pass AND no HIGH-severity issues
  • FAIL if: any spec criterion unmet OR any quality gate fails OR any HIGH-severity issue found

4. Remove scout references:

  • Delete ov spec write mention (reviewer doesn't write specs)
  • Replace exploration-oriented completion protocol with verdict-oriented protocol

P0: Builder closes issue before lead verifies — timing bug

Problem

In builder.md, the completion protocol is:

  1. Send worker_done mail (step 6)
  2. Close the issue (step 7)
  3. Exit immediately (step 8)

But the lead's workflow (lead.md:248-288) is:

  1. Receive worker_done
  2. Decide whether to self-verify or spawn reviewer
  3. If FAIL: send revision request to builder

The builder has already closed and exited before the lead can request revisions. The lead must reopen the issue and either fix it themselves or spawn a new builder.

Proposed Fix

Add a failure mode and modify the completion protocol:

# New failure mode:
- **PREMATURE_ISSUE_CLOSE** -- Closing your issue before receiving acknowledgment from your parent lead. After sending `worker_done`, wait for either acceptance or revision feedback before closing.

# Modified completion steps:
6. Send `worker_done` mail to parent.
7. Check mail once more for revision requests:
   - If revision request received: address feedback, re-run quality gates, send another `worker_done`.
   - If no revision request (or explicit acceptance): proceed to close.
8. Close the issue.
9. Exit.

P1: Merger doesn't preserve .mulch/ during conflict resolution

Problem

The merger's tiered conflict resolution (Tier 1-4) focuses on source code. .mulch/expertise/*.jsonl files are append-only JSONL — when both branches add records to the same file, the correct resolution is concatenation, not choosing one side. But the merger has no guidance on this, and Tier 3 (AI-Resolve) or Tier 4 (Reimagine) can silently discard mulch records.

Proposed Fix

Add failure mode and preservation rule:

# New failure mode:
- **MULCH_DATA_LOSS** -- Discarding `.mulch/` directory changes during conflict resolution. `.mulch/expertise/*.jsonl` files are append-only JSONL — if both sides added records, concatenate them. Never resolve a `.mulch/` conflict by choosing one side.

# Add before the tier descriptions:
**`.mulch/` preservation rule (applies to ALL tiers):** `.mulch/expertise/*.jsonl` files are append-only. If both branches added records to the same file, concatenate both sets of records. After resolving `.mulch/` conflicts, verify no records were lost.

P1: Merger sends wrong mail type — coordinator can't route it

Problem

The coordinator (coordinator.md:199-207) expects to receive merged and merge_failed typed messages. But the merger (merger.md:141-146) sends generic --type result. The coordinator's mail routing logic may not handle merger completions correctly.

Proposed Fix

Change merger's report mail to use typed messages:

# On success:
ov mail send --to <parent-or-coordinator> \
  --subject "Merged: <branch>" \
  --body "Tier: <tier-used>. Conflicts: <list or none>. Tests: passing." \
  --type merged

# On failure:
ov mail send --to <parent-or-coordinator> \
  --subject "Merge failed: <branch>" \
  --body "Tier reached: <tier>. Conflict files: <list>. Error: <description>." \
  --type merge_failed --priority high

P1: Lead has no guidance for mixed-runtime workers

Problem

Overstory supports 11 runtimes, and config.runtime.capabilities allows per-role routing. A lead might spawn a scout on one runtime and a builder on another. But the lead's three-phase workflow assumes all workers participate in the mail protocol, which is only true for Claude/Sapling/Pi.

Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.) receive instructions via AGENTS.md but cannot send worker_done, result, question, or error messages. The lead's "collect scout results" and "wait for worker_done" steps fail silently.

Proposed Fix

Add a mixed-runtime-awareness section to lead.md:

## mixed-runtime-awareness

Workers may run on different runtimes depending on `config.runtime.capabilities`.

### Mail-capable runtimes (Claude, Sapling, Pi)
Use the standard three-phase workflow with mail-based coordination.

### Non-mail runtimes (Codex, Aider, Gemini, Cursor, etc.)
These receive instructions via AGENTS.md but cannot send or receive mail:

1. Do not wait for `worker_done` mail. Monitor worktree for completion:
   - Check `ov status` for agent state changes
   - Check worktree git log for commits
2. Do not send questions or revision requests via mail — the worker can't read them.
3. Self-verify output by reading the worktree diff directly.
4. Expect faster completion but no mid-task communication.
5. If a non-mail worker appears stalled (no new commits), it has likely failed silently. Replace rather than nudge.

P1: Scout has no structured output contract

Problem

The scout sends findings via ov spec write and a short result mail, but there's no defined schema for what the lead expects to receive. Leads must extract structure from free-form prose, leading to inconsistent spec quality.

Proposed Fix

Add a report-format section to scout.md:

## report-format

Structure spec/report output with these sections:

# Scout Report: <task-id>

## Files Found
- <path> — <purpose/role>

## Patterns Observed
- <pattern name>: <description, with file:line references>

## Types and Interfaces
- <type/interface name> at <file:line> — <what it defines, who consumes it>

## Dependencies
- <component A> depends on <component B> via <mechanism>

## Risks and Gotchas
- <risk description> — <why it matters for the task>

## Suggested Mulch Records
- domain: <domain>, type: <convention|pattern>, classification: <foundational|tactical|observational>
  description: "<insight worth preserving>"

This gives leads a predictable structure to extract builder specs from.


P2: Mulch protocol inconsistencies across agents

Problem

Mulch recording is inconsistent across agent types:

Agent Records mulch? Required?
Scout No (suggests for parent) No
Builder Yes Yes (failure mode)
Reviewer No No
Lead Yes Yes (failure mode)
Merger Yes (Tier 2+) Yes (failure mode)
Monitor Yes (capability) No explicit requirement

Reviewers never record insights, despite finding reusable patterns (common bug types, convention violations) across multiple reviews. This is a knowledge loss.

Proposed Fix

Add MISSING_MULCH_RECORD failure mode to reviewer.md and require mulch recording in the reviewer's completion protocol.


P2: Monitor doesn't check protocol compliance

Problem

The monitor's anomaly detection (monitor.md:181-188) only checks liveness (stalls, zombie agents, cascade failures). It doesn't detect protocol violations:

  • Lead spawning builders without prior scout activity
  • Builder completing without .mulch/ changes
  • Coordinator merging non-lead branches

Proposed Fix

Add protocol-compliance anomalies to monitor.md:

## protocol-compliance-checks

In addition to liveness anomalies, watch for protocol violations:

- **SCOUT_BYPASS_DETECTED**: Lead spawned builders without any prior scout agent on the same task. Check agent spawn history via `ov status`.
- **MULCH_MISSING**: Agent completed and their worktree branch has no `.mulch/` changes. Check via `git diff main..<agent-branch> -- .mulch/`.
- **DIRECT_MERGE_ATTEMPT**: Coordinator merged a branch not matching `overstory/lead-*`. Flag to coordinator as a BUILDER_DIRECT_MERGE violation.

Summary of All Proposed Changes

File Priority Change
reviewer.md P0 Full rewrite — own identity, PASS/FAIL contract, reviewer-specific failure modes, mulch recording
builder.md P0 Add PREMATURE_ISSUE_CLOSE failure mode, wait for lead acknowledgment before closing
lead.md P1 Add mixed-runtime-awareness section for non-mail runtimes
merger.md P1 Add MULCH_DATA_LOSS failure mode, .mulch/ preservation rule, fix mail type to merged/merge_failed
scout.md P1 Add report-format section with structured output template
reviewer.md P2 Add MISSING_MULCH_RECORD (included in P0 rewrite)
monitor.md P2 Add protocol-compliance-checks section

Context

These findings come from running overstory on a ~600-agent campaign (35 batches) on a polyglot monorepo with mixed runtimes. The issues are not runtime-specific or project-specific — they're structural gaps in how agent roles interrelate. The P0 issues (reviewer identity, builder timing) caused real workflow failures. The P1 issues (mulch loss, mail type mismatches, mixed-runtime blindness) caused silent knowledge loss and coordination breakdowns that only surfaced during post-batch analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agent-lifecycleAgent identity, sessions, checkpoint/resume, slingarea:coordinationCoordinator, lead hierarchy, groups, multi-agent orchestrationdifficulty:complexCross-cutting, architectural, or high-riskenhancementpriority:highSignificant improvement, clear path to implement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions