Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions .claude/plans/planner-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Implementation Plan: Planner Agent — Automated Spec Expansion

## Overview

The Planner Agent introduces a pre-execution planning phase to Orchestra's assistant pipeline. When enabled via `PlannerConfig` on an Assistant, the planner intercepts the first message in a thread, expands the user's short prompt into a structured product specification (persisted as `/PLAN.md` in StateBackend), and optionally pauses for human-in-the-loop approval before the generator begins work. This follows Anthropic's harness research showing that a high-level planning phase dramatically improves the ambition and coherence of generated output.

## User Stories

1. **As an assistant creator**, I want to toggle a planner phase on my assistant so that short user prompts are automatically expanded into structured product specs before execution begins.

2. **As a user chatting with a planner-enabled assistant**, I want to see a "Planning..." indicator and then a collapsible plan panel so I understand what the assistant intends to build before it starts.

3. **As a team lead**, I want to require human approval of the generated plan (auto_approve=false) so that I can review and refine scope before the generator consumes tokens on implementation.

4. **As a power user**, I want to choose between "conservative" and "ambitious" scope levels so I can control how aggressively the planner expands my prompt.

5. **As a developer**, I want the planner to use a configurable model override (e.g., Opus for planning, Sonnet for generation) so I can balance quality and cost across phases.

## Implementation Phases

### Phase 1: Schema & Data Layer
**Files:**
- `backend/src/schemas/entities/planner.py` (new)
- `backend/src/schemas/entities/llm.py` (modify)
- `backend/migrations/versions/xxx_add_planner.py` (new)

**Tasks:**
1. Create `PlannerConfig` Pydantic model with fields: `enabled`, `auto_approve`, `model`, `scope_level`.
2. Add optional `planner: PlannerConfig` field to the `Assistant` schema.
3. Generate Alembic migration to persist planner config as a JSONB column on the assistants table.

**Acceptance Criteria:**
- PlannerConfig validates all field types and defaults correctly.
- Migration runs forward and backward cleanly.
- Existing assistants default to `planner=None` (no breaking change).

### Phase 2: Planner System Prompt
**Files:**
- `backend/src/static/prompts/md/planner.md` (new)

**Tasks:**
1. Write a system prompt that instructs the planner to:
- Expand short prompts into structured product specs.
- Be ambitious about scope; identify AI feature opportunities.
- Stay high-level (product context, features, user stories, design direction).
- Avoid granular implementation details.
- Output structured markdown: Overview, Features (with user stories), Tech Stack, Design Direction.

**Acceptance Criteria:**
- Prompt produces well-structured specs from 1-4 sentence inputs.
- Output format is consistent and parseable.

### Phase 3: Pre-Execution Phase in LLMController
**Files:**
- `backend/src/controllers/llm.py` (modify)

**Tasks:**
1. On first message in a thread, check if the assistant has `planner.enabled = True`.
2. Invoke the planner with the user's message using the configured (or default) model.
3. If `auto_approve=False`, create a HITL interrupt presenting the plan for user approval.
4. On approval (or if `auto_approve=True`), persist the plan to StateBackend as `/PLAN.md`.
5. Inject the plan into the generator's context/memory sources.
6. Proceed to generator execution with original user message + plan context.

**Acceptance Criteria:**
- Planner phase only triggers on the first message of a thread.
- HITL interrupt blocks generator execution until approved.
- Plan is accessible to the generator throughout the thread lifecycle.
- Subsequent messages in the thread skip the planner phase.

### Phase 4: Frontend Display
**Files:**
- `frontend/src/components/chat/PlanPanel.tsx` (new)
- Thread view integration (modify existing thread/chat components)

**Tasks:**
1. Create `PlanPanel` component: collapsible panel rendering the plan markdown.
2. Show plan status indicator: "Planning...", "Awaiting approval", "Approved".
3. If `auto_approve=False`, render approval/rejection buttons using existing HITL interrupt UI patterns.
4. Integrate PlanPanel at the top of the thread view.

**Acceptance Criteria:**
- Plan panel renders markdown correctly and is collapsible.
- Approval UI integrates with existing HITL flow.
- Status transitions are reflected in real-time.

### Phase 5: Tests & Example
**Files:**
- `backend/tests/unit/test_planner_config.py` (new)
- `backend/tests/integration/test_planner_phase.py` (new)
- `examples/agents/planner_example.ipynb` (new)

**Tasks:**
1. Unit test: `PlannerConfig` field validation and defaults.
2. Unit test: planner phase triggers only on first message.
3. Integration test: planner expands "Build a todo app" into a multi-feature spec.
4. Integration test: `auto_approve=False` creates HITL interrupt.
5. Create example notebook demonstrating planner + generator pipeline.

**Acceptance Criteria:**
- All tests pass in CI.
- Example notebook runs end-to-end with a planner-enabled assistant.

## Dependencies & Risks

### Dependencies
- Existing HITL interrupt infrastructure (for approval flow).
- StateBackend file persistence (for `/PLAN.md` storage).
- Frontend HITL interrupt components (for approval UI).

### Risks
| Risk | Impact | Mitigation |
|------|--------|------------|
| Over-scoping by planner | Generator cannot implement the full spec | `scope_level` config; generator can flag unimplementable items |
| Latency from extra LLM round-trip | Slower time-to-first-output | Use fast model for planner; show "Planning..." state in UI |
| Plan staleness over long threads | Plan becomes irrelevant as conversation evolves | Allow re-planning via user command or after N messages |
| Migration on large assistant tables | Slow deployment | JSONB column with NULL default; no table rewrite needed |

## Testing Strategy

- **Unit tests**: Schema validation, planner trigger logic, plan persistence.
- **Integration tests**: End-to-end planner flow with mocked LLM, HITL interrupt creation and resolution.
- **Manual QA**: Create a planner-enabled assistant in the UI, send a short prompt, verify plan display and approval flow.
- **Performance**: Measure added latency from planner phase; ensure it stays under 10s for typical prompts.
101 changes: 101 additions & 0 deletions .claude/specs/planner-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Plan: Planner Agent — Automated Spec Expansion

## Context

Anthropic's harness research shows a planner agent that expands 1–4 sentence prompts into full product specs dramatically increases the ambition and coherence of generated output. The planner stays intentionally high-level — specifying granular implementation details upfront leads to cascading errors downstream. The planner focuses on product context, features, user stories, and design direction, then lets the generator figure out implementation.

Orchestra currently jumps straight from user input to agent execution with no planning phase.

## Requirements

- `PlannerConfig` on the Assistant schema (toggleable)
- Planner runs as a pre-phase before the generator
- Expands short prompts into structured product specs
- Persists plan as `/PLAN.md` in StateBackend for generator to reference
- Optionally presents plan to user for approval before execution (HITL integration)
- Planner prompted to be ambitious about scope and identify AI feature opportunities

## Implementation Steps

### Step 1: Schema definitions

Add to `backend/src/schemas/entities/`:

```python
# planner.py
class PlannerConfig(BaseModel):
enabled: bool = False
auto_approve: bool = True # False = pause for user approval via HITL
model: Optional[str] = None # Override model for planner (e.g. use Opus)
scope_level: Literal["conservative", "ambitious"] = "ambitious"
```

### Step 2: Extend Assistant schema

Add optional `planner: PlannerConfig` to `Assistant` in `backend/src/schemas/entities/llm.py`.

### Step 3: Planner system prompt

Create `backend/src/static/prompts/md/planner.md`:

- Role: expand a short user prompt into a structured product specification
- Be ambitious about scope — include features the user didn't explicitly request
- Stay high-level: product context, feature list, user stories, design language
- Do NOT specify granular implementation details (avoids cascading errors)
- Identify opportunities to weave AI-powered features into the product
- Output format: structured markdown with sections for Overview, Features (with user stories), Tech Stack, Design Direction

### Step 4: Pre-execution phase in LLMController

Modify `backend/src/controllers/llm.py`:

1. On first message in a thread, check if assistant has planner enabled
2. If yes, invoke planner assistant with user's message
3. Planner produces `/PLAN.md` content
4. If `auto_approve=False`, create HITL interrupt presenting the plan for user approval
5. If approved (or auto_approve), persist plan to StateBackend as `/PLAN.md`
6. Add plan file to generator's memory sources so it's available throughout execution
7. Proceed to generator with original user message + plan context

### Step 5: Frontend display

- Show plan in a collapsible panel at the top of the thread
- If `auto_approve=False`, render approval UI using existing HITL interrupt components
- Plan status indicator: "Planning...", "Awaiting approval", "Approved"

### Step 6: Migration

Alembic migration for planner config persistence on assistants.

### Step 7: Tests

- Unit test: `PlannerConfig` validation
- Unit test: planner phase triggers only on first message
- Integration test: planner expands "Build a todo app" into multi-feature spec with user stories
- Integration test: `auto_approve=False` creates HITL interrupt

### Step 8: Example

Create `examples/agents/planner_example.ipynb` demonstrating planner + generator pipeline.

## File Changes

- `backend/src/schemas/entities/planner.py` — new
- `backend/src/schemas/entities/llm.py` — add planner field
- `backend/src/controllers/llm.py` — pre-execution phase
- `backend/src/static/prompts/md/planner.md` — new
- `backend/migrations/versions/xxx_add_planner.py` — new
- `frontend/src/components/chat/PlanPanel.tsx` — new
- `examples/agents/planner_example.ipynb` — new

## Risks

- Over-scoping: planner may generate specs too ambitious for the generator. Mitigation: `scope_level` config, and generator can flag spec items it can't implement.
- Latency: planner adds a full LLM round-trip before work begins. Mitigation: use a fast model for planner when possible, show "Planning..." state in UI.
- Plan staleness: if the thread evolves, the initial plan may become irrelevant. Mitigation: allow re-planning via user command or after N messages.

## GitHub Issue

**Title:** `feat: Planner Agent — auto-expand short prompts into full product specs`
**Labels:** `enhancement`, `agents`, `high-impact`, `harness-design`
**Milestone:** v0.9.0 — Harness Design
1 change: 1 addition & 0 deletions backend/src/schemas/entities/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
PatchDefaultsRequest as PatchDefaultsRequest,
UpsertProviderKeyRequest as UpsertProviderKeyRequest,
)
from src.schemas.entities.planner import PlannerConfig as PlannerConfig
from src.schemas.entities.hitl import (
DecisionType as DecisionType,
HumanDecision as HumanDecision,
Expand Down
2 changes: 2 additions & 0 deletions backend/src/schemas/entities/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
SystemMessage,
ToolMessage,
)
from src.schemas.entities.planner import PlannerConfig
from src.services.prompt.defaults import get_default_system_prompt
from src.utils.format import slugify

Expand Down Expand Up @@ -110,6 +111,7 @@ def validate_system_prompt_or_instructions(self):
default_factory=dict,
description="File system storage for the assistant. Key is the file path, value is the file content.",
)
planner: Optional[PlannerConfig] = Field(default=None, description="Planner agent configuration")
metadata: dict = {}
updated_at: Optional[datetime] = None
created_at: Optional[datetime] = None
Expand Down
12 changes: 12 additions & 0 deletions backend/src/schemas/entities/planner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from typing import Literal, Optional

from pydantic import BaseModel


class PlannerConfig(BaseModel):
"""Configuration for the planner pre-execution phase."""

enabled: bool = False
auto_approve: bool = True # False = pause for user approval via HITL
model: Optional[str] = None # Override model for planner (e.g., use Opus)
scope_level: Literal["conservative", "ambitious"] = "ambitious"
38 changes: 38 additions & 0 deletions backend/src/static/prompts/md/planner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
You are a product planning agent. Your job is to expand a short user prompt into a structured product specification.

## Your Role

- Be ambitious about scope — include features the user didn't explicitly request but would clearly benefit from
- Stay high-level: product context, feature list, user stories, design direction
- Do NOT specify granular implementation details (file names, function signatures, database schemas)
- Identify opportunities to weave AI-powered features into the product
- Think about what would make this product genuinely impressive, not just functional

## Output Format

Structure your plan as markdown with these sections:

### Overview
A 2-3 sentence summary of what we're building and why.

### Features
A numbered list of features, each with:
- Feature name
- One-line description
- 1-2 user stories in "As a [user], I want [action] so that [benefit]" format

### Design Direction
- Visual style and tone
- Key UX principles
- Reference points or inspirations

### Technical Considerations
- Recommended tech stack (if relevant)
- Key architectural decisions
- Performance or scale considerations

### Success Criteria
- How we'll know this is done and done well
- Key metrics to track

Keep the plan concise but comprehensive. Aim for 300-800 words.
80 changes: 80 additions & 0 deletions backend/src/utils/planner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
"""Planner pre-execution phase — expands short prompts into structured product specs."""

from pathlib import Path

from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, SystemMessage

from src.schemas.entities.planner import PlannerConfig
from src.utils.llm import resolve_api_key
from src.utils.logger import logger

# Load the planner system prompt
_PLANNER_PROMPT_PATH = Path(__file__).parent.parent / "static" / "prompts" / "md" / "planner.md"
if not _PLANNER_PROMPT_PATH.exists():
raise RuntimeError(f"Planner prompt file not found: {_PLANNER_PROMPT_PATH}")
_PLANNER_PROMPT = _PLANNER_PROMPT_PATH.read_text()


async def run_planner(
user_message: str,
planner_config: PlannerConfig,
default_model: str,
api_key: str | None = None,
user_keys: dict[str, str] | None = None,
) -> str:
"""Run the planner to expand a short prompt into a structured spec.

Returns the plan as markdown text.
"""
model_name = planner_config.model or default_model

# Resolve API key for the planner model
planner_api_key = api_key
if planner_config.model:
resolved = resolve_api_key(planner_config.model, user_keys)
if resolved:
planner_api_key = resolved

llm = init_chat_model(model_name, api_key=planner_api_key)

scope_instruction = ""
if planner_config.scope_level == "conservative":
scope_instruction = (
"\n\nIMPORTANT: Keep the scope conservative. "
"Only include features explicitly requested by the user. Do not add extras."
)
elif planner_config.scope_level == "ambitious":
scope_instruction = (
"\n\nBe ambitious about scope. Include features that would make this product impressive, "
"even if the user didn't explicitly request them."
)
else:
scope_instruction = ""
logger.warning(f"planner_unknown_scope scope_level={planner_config.scope_level}")

MAX_PLANNER_INPUT = 10_000 # characters
if len(user_message) > MAX_PLANNER_INPUT:
logger.warning(f"planner_input_truncated original_length={len(user_message)}")
user_message = user_message[:MAX_PLANNER_INPUT]

system_prompt = _PLANNER_PROMPT + scope_instruction

messages = [
SystemMessage(content=system_prompt),
HumanMessage(content=f"Please create a product plan for the following request:\n\n{user_message}"),
]

logger.info(f"planner_phase model={model_name} scope={planner_config.scope_level}")

try:
response = await llm.ainvoke(messages)
except Exception as e:
logger.error(f"planner_phase_failed model={model_name} error={e}")
raise RuntimeError(f"Planner failed to generate plan: {e}") from e

plan_text = response.content if isinstance(response.content, str) else str(response.content)

logger.info(f"planner_phase_complete plan_length={len(plan_text)}")

return plan_text
Loading