Code reviewer doesn't do a functional review, lead agent blindly accepts it and coordinator happily merges it into repo

### Steps to reproduce

1) Run `overstory`
2) Issue a task - in my case I've asked to monitor a Lambda service on port 5000 that was unreachable to reconnect automatically when it becomes available
3) As per `coordinator` feedback:



Background info on the issue itself (from my chat with the `coordinator`):

```
Found the root cause. checkLambdaAvailable() uses GetFunctionCommand — this is the Lambda Management API (GET /2015-03-31/functions/{FunctionName}). But serverless-offline
  only emulates the Lambda Invoke API, not the management API. When GetFunctionCommand hits serverless-offline, it returns an error WITHOUT a proper AWS requestId, because
  serverless-offline doesn't know how to handle management endpoints.
```

Me: `If you can see the bug now, why couldn't you see the bug BEFORE you merged it?`

```
You're right. I could see it now because you told me the symptom — "reconnection doesn't work." If I had actually thought about what GetFunctionCommand does against
  serverless-offline before merging, I would have caught it. I read the diff, saw the code was syntactically correct, and merged without asking "does serverless-offline even
  support this API?" That's the core failure — I was checking code quality instead of functional correctness.
```

This is from `https://github.com/jayminwest/overstory/blob/main/.claude/commands/pr-reviews.md`:

```
b. Code quality review
Check for correctness — does the code do what the PR claims?
Check for bugs, edge cases, and error handling gaps
Check adherence to project conventions (see CLAUDE.md): strict TypeScript, zero runtime deps, Biome formatting, tab indentation, 100-char line width
Check test coverage — are new code paths tested? Do tests follow the "never mock what you can use for real" philosophy?
Flag any security concerns (injection, unsafe input handling, etc.)
```

There's clearly a big gap here between what the `reviewer` is tasked to do and what the `lead` and `coordinator` agent are then doing. This is big problem as the AI agent has a tendency to shortcut tasks (maybe due to using a less superior model and a limited context) and deliver code that isn't doing what was requested.

As both `reviewer` and `lead` agents fail to detect this, such code gets into the repo as the `coordinator` assumes that both `reviewer` and `lead` agents did their job properly and happily merges code as long as the code is SYNTACTICALLY is correct (as per his own words).

This isn't the first time I've observed this. It just happened now twice on the same original task to handle an "unreachable" connection for this specific Lambda service.

While I certainly can always ask the `coordinator` to do a proper job I think the main failure is already at the `reviewer` and `lead` agent level where both are failing to do a proper job.

### Expected behavior

Both `reviewer` as well as `lead` agent should identify code that isn't functionally correct and push back. Right now it seems we have a `linter` instead of a `reviewer` and the `lead` agent is basically an useless overhead as `coordinator` could also spawn `builder`, `scout` and `reviewer` agents himself (or herself, I guess and AI agent doesn't have a gender).

### Actual behavior

Functionally wrong code got "approved" by the `reviewer` agent, got accepted by the `lead` agent was happily merged by the `coordinator` agent.

All 3 agents failed in doing their job properly, bad code got into the repo and the human had to do their job and push back. That turns the whole overstory AI agent "help" on its head. It becomes a burden instead of a help, I sometimes feel like "Kindergarden cop".

### overstory version

0.9.1

### Bun version

1.3.10

### Operating system

macOS 26.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Code reviewer doesn't do a functional review, lead agent blindly accepts it and coordinator happily merges it into repo #131

Steps to reproduce

Expected behavior

Actual behavior

overstory version

Bun version

Operating system

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Code reviewer doesn't do a functional review, lead agent blindly accepts it and coordinator happily merges it into repo #131

Description

Steps to reproduce

Expected behavior

Actual behavior

overstory version

Bun version

Operating system

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions