mifunedev · ryaneggz · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/.claude/plans/planner-agent.md b/.claude/plans/planner-agent.md
@@ -0,0 +1,124 @@
+# Implementation Plan: Planner Agent — Automated Spec Expansion
+
+## Overview
+
+The Planner Agent introduces a pre-execution planning phase to Orchestra's assistant pipeline. When enabled via `PlannerConfig` on an Assistant, the planner intercepts the first message in a thread, expands the user's short prompt into a structured product specification (persisted as `/PLAN.md` in StateBackend), and optionally pauses for human-in-the-loop approval before the generator begins work. This follows Anthropic's harness research showing that a high-level planning phase dramatically improves the ambition and coherence of generated output.
+
+## User Stories
+
+1. **As an assistant creator**, I want to toggle a planner phase on my assistant so that short user prompts are automatically expanded into structured product specs before execution begins.
+
+2. **As a user chatting with a planner-enabled assistant**, I want to see a "Planning..." indicator and then a collapsible plan panel so I understand what the assistant intends to build before it starts.
+
+3. **As a team lead**, I want to require human approval of the generated plan (auto_approve=false) so that I can review and refine scope before the generator consumes tokens on implementation.
+
+4. **As a power user**, I want to choose between "conservative" and "ambitious" scope levels so I can control how aggressively the planner expands my prompt.
+
+5. **As a developer**, I want the planner to use a configurable model override (e.g., Opus for planning, Sonnet for generation) so I can balance quality and cost across phases.
+
+## Implementation Phases
+
+### Phase 1: Schema & Data Layer
+**Files:**
+- `backend/src/schemas/entities/planner.py` (new)
+- `backend/src/schemas/entities/llm.py` (modify)
+- `backend/migrations/versions/xxx_add_planner.py` (new)
+
+**Tasks:**
+1. Create `PlannerConfig` Pydantic model with fields: `enabled`, `auto_approve`, `model`, `scope_level`.
+2. Add optional `planner: PlannerConfig` field to the `Assistant` schema.
+3. Generate Alembic migration to persist planner config as a JSONB column on the assistants table.
+
+**Acceptance Criteria:**
+- PlannerConfig validates all field types and defaults correctly.
+- Migration runs forward and backward cleanly.
+- Existing assistants default to `planner=None` (no breaking change).
+
+### Phase 2: Planner System Prompt
+**Files:**
+- `backend/src/static/prompts/md/planner.md` (new)
+
+**Tasks:**
+1. Write a system prompt that instructs the planner to:
+   - Expand short prompts into structured product specs.
+   - Be ambitious about scope; identify AI feature opportunities.
+   - Stay high-level (product context, features, user stories, design direction).
+   - Avoid granular implementation details.
+   - Output structured markdown: Overview, Features (with user stories), Tech Stack, Design Direction.
+
+**Acceptance Criteria:**
+- Prompt produces well-structured specs from 1-4 sentence inputs.
+- Output format is consistent and parseable.
+
+### Phase 3: Pre-Execution Phase in LLMController
+**Files:**
+- `backend/src/controllers/llm.py` (modify)
+
+**Tasks:**
+1. On first message in a thread, check if the assistant has `planner.enabled = True`.
+2. Invoke the planner with the user's message using the configured (or default) model.
+3. If `auto_approve=False`, create a HITL interrupt presenting the plan for user approval.
+4. On approval (or if `auto_approve=True`), persist the plan to StateBackend as `/PLAN.md`.
+5. Inject the plan into the generator's context/memory sources.
+6. Proceed to generator execution with original user message + plan context.
+
+**Acceptance Criteria:**
+- Planner phase only triggers on the first message of a thread.
+- HITL interrupt blocks generator execution until approved.
+- Plan is accessible to the generator throughout the thread lifecycle.
+- Subsequent messages in the thread skip the planner phase.
+
+### Phase 4: Frontend Display
+**Files:**
+- `frontend/src/components/chat/PlanPanel.tsx` (new)
+- Thread view integration (modify existing thread/chat components)
+
+**Tasks:**
+1. Create `PlanPanel` component: collapsible panel rendering the plan markdown.
+2. Show plan status indicator: "Planning...", "Awaiting approval", "Approved".
+3. If `auto_approve=False`, render approval/rejection buttons using existing HITL interrupt UI patterns.
+4. Integrate PlanPanel at the top of the thread view.
+
+**Acceptance Criteria:**
+- Plan panel renders markdown correctly and is collapsible.
+- Approval UI integrates with existing HITL flow.
+- Status transitions are reflected in real-time.
+
+### Phase 5: Tests & Example
+**Files:**
+- `backend/tests/unit/test_planner_config.py` (new)
+- `backend/tests/integration/test_planner_phase.py` (new)
+- `examples/agents/planner_example.ipynb` (new)
+
+**Tasks:**
+1. Unit test: `PlannerConfig` field validation and defaults.
+2. Unit test: planner phase triggers only on first message.
+3. Integration test: planner expands "Build a todo app" into a multi-feature spec.
+4. Integration test: `auto_approve=False` creates HITL interrupt.
+5. Create example notebook demonstrating planner + generator pipeline.
+
+**Acceptance Criteria:**
+- All tests pass in CI.
+- Example notebook runs end-to-end with a planner-enabled assistant.
+
+## Dependencies & Risks
+
+### Dependencies
+- Existing HITL interrupt infrastructure (for approval flow).
+- StateBackend file persistence (for `/PLAN.md` storage).
+- Frontend HITL interrupt components (for approval UI).
+
+### Risks
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| Over-scoping by planner | Generator cannot implement the full spec | `scope_level` config; generator can flag unimplementable items |
+| Latency from extra LLM round-trip | Slower time-to-first-output | Use fast model for planner; show "Planning..." state in UI |
+| Plan staleness over long threads | Plan becomes irrelevant as conversation evolves | Allow re-planning via user command or after N messages |
+| Migration on large assistant tables | Slow deployment | JSONB column with NULL default; no table rewrite needed |
+
+## Testing Strategy
+
+- **Unit tests**: Schema validation, planner trigger logic, plan persistence.
+- **Integration tests**: End-to-end planner flow with mocked LLM, HITL interrupt creation and resolution.
+- **Manual QA**: Create a planner-enabled assistant in the UI, send a short prompt, verify plan display and approval flow.
+- **Performance**: Measure added latency from planner phase; ensure it stays under 10s for typical prompts.
diff --git a/.claude/specs/planner-agent.md b/.claude/specs/planner-agent.md
@@ -0,0 +1,101 @@
+# Plan: Planner Agent — Automated Spec Expansion
+
+## Context
+
+Anthropic's harness research shows a planner agent that expands 1–4 sentence prompts into full product specs dramatically increases the ambition and coherence of generated output. The planner stays intentionally high-level — specifying granular implementation details upfront leads to cascading errors downstream. The planner focuses on product context, features, user stories, and design direction, then lets the generator figure out implementation.
+
+Orchestra currently jumps straight from user input to agent execution with no planning phase.
+
+## Requirements
+
+- `PlannerConfig` on the Assistant schema (toggleable)
+- Planner runs as a pre-phase before the generator
+- Expands short prompts into structured product specs
+- Persists plan as `/PLAN.md` in StateBackend for generator to reference
+- Optionally presents plan to user for approval before execution (HITL integration)
+- Planner prompted to be ambitious about scope and identify AI feature opportunities
+
+## Implementation Steps
+
+### Step 1: Schema definitions
+
+Add to `backend/src/schemas/entities/`:
+
+```python
+# planner.py
+class PlannerConfig(BaseModel):
+    enabled: bool = False
+    auto_approve: bool = True     # False = pause for user approval via HITL
+    model: Optional[str] = None   # Override model for planner (e.g. use Opus)
+    scope_level: Literal["conservative", "ambitious"] = "ambitious"
+```
+
+### Step 2: Extend Assistant schema
+
+Add optional `planner: PlannerConfig` to `Assistant` in `backend/src/schemas/entities/llm.py`.
+
+### Step 3: Planner system prompt
+
+Create `backend/src/static/prompts/md/planner.md`:
+
+- Role: expand a short user prompt into a structured product specification
+- Be ambitious about scope — include features the user didn't explicitly request
+- Stay high-level: product context, feature list, user stories, design language
+- Do NOT specify granular implementation details (avoids cascading errors)
+- Identify opportunities to weave AI-powered features into the product
+- Output format: structured markdown with sections for Overview, Features (with user stories), Tech Stack, Design Direction
+
+### Step 4: Pre-execution phase in LLMController
+
+Modify `backend/src/controllers/llm.py`:
+
+1. On first message in a thread, check if assistant has planner enabled
+2. If yes, invoke planner assistant with user's message
+3. Planner produces `/PLAN.md` content
+4. If `auto_approve=False`, create HITL interrupt presenting the plan for user approval
+5. If approved (or auto_approve), persist plan to StateBackend as `/PLAN.md`
+6. Add plan file to generator's memory sources so it's available throughout execution
+7. Proceed to generator with original user message + plan context
+
+### Step 5: Frontend display
+
+- Show plan in a collapsible panel at the top of the thread
+- If `auto_approve=False`, render approval UI using existing HITL interrupt components
+- Plan status indicator: "Planning...", "Awaiting approval", "Approved"
+
+### Step 6: Migration
+
+Alembic migration for planner config persistence on assistants.
+
+### Step 7: Tests
+
+- Unit test: `PlannerConfig` validation
+- Unit test: planner phase triggers only on first message
+- Integration test: planner expands "Build a todo app" into multi-feature spec with user stories
+- Integration test: `auto_approve=False` creates HITL interrupt
+
+### Step 8: Example
+
+Create `examples/agents/planner_example.ipynb` demonstrating planner + generator pipeline.
+
+## File Changes
+
+- `backend/src/schemas/entities/planner.py` — new
+- `backend/src/schemas/entities/llm.py` — add planner field
+- `backend/src/controllers/llm.py` — pre-execution phase
+- `backend/src/static/prompts/md/planner.md` — new
+- `backend/migrations/versions/xxx_add_planner.py` — new
+- `frontend/src/components/chat/PlanPanel.tsx` — new
+- `examples/agents/planner_example.ipynb` — new
+
+## Risks
+
+- Over-scoping: planner may generate specs too ambitious for the generator. Mitigation: `scope_level` config, and generator can flag spec items it can't implement.
+- Latency: planner adds a full LLM round-trip before work begins. Mitigation: use a fast model for planner when possible, show "Planning..." state in UI.
+- Plan staleness: if the thread evolves, the initial plan may become irrelevant. Mitigation: allow re-planning via user command or after N messages.
+
+## GitHub Issue
+
+**Title:** `feat: Planner Agent — auto-expand short prompts into full product specs`
+**Labels:** `enhancement`, `agents`, `high-impact`, `harness-design`
+**Milestone:** v0.9.0 — Harness Design
diff --git a/backend/src/schemas/entities/__init__.py b/backend/src/schemas/entities/__init__.py
@@ -24,6 +24,7 @@
     PatchDefaultsRequest as PatchDefaultsRequest,
     UpsertProviderKeyRequest as UpsertProviderKeyRequest,
 )
+from src.schemas.entities.planner import PlannerConfig as PlannerConfig
 from src.schemas.entities.hitl import (
     DecisionType as DecisionType,
     HumanDecision as HumanDecision,

diff --git a/backend/src/schemas/entities/llm.py b/backend/src/schemas/entities/llm.py
@@ -18,6 +18,7 @@
     SystemMessage,
     ToolMessage,
 )
+from src.schemas.entities.planner import PlannerConfig
 from src.services.prompt.defaults import get_default_system_prompt
 from src.utils.format import slugify
 
@@ -110,6 +111,7 @@ def validate_system_prompt_or_instructions(self):
         default_factory=dict,
         description="File system storage for the assistant. Key is the file path, value is the file content.",
     )
+    planner: Optional[PlannerConfig] = Field(default=None, description="Planner agent configuration")
     metadata: dict = {}
     updated_at: Optional[datetime] = None
     created_at: Optional[datetime] = None

diff --git a/backend/src/schemas/entities/planner.py b/backend/src/schemas/entities/planner.py
@@ -0,0 +1,12 @@
+from typing import Literal, Optional
+
+from pydantic import BaseModel
+
+
+class PlannerConfig(BaseModel):
+    """Configuration for the planner pre-execution phase."""
+
+    enabled: bool = False
+    auto_approve: bool = True  # False = pause for user approval via HITL
+    model: Optional[str] = None  # Override model for planner (e.g., use Opus)
+    scope_level: Literal["conservative", "ambitious"] = "ambitious"
diff --git a/backend/src/static/prompts/md/planner.md b/backend/src/static/prompts/md/planner.md
@@ -0,0 +1,38 @@
+You are a product planning agent. Your job is to expand a short user prompt into a structured product specification.
+
+## Your Role
+
+- Be ambitious about scope — include features the user didn't explicitly request but would clearly benefit from
+- Stay high-level: product context, feature list, user stories, design direction
+- Do NOT specify granular implementation details (file names, function signatures, database schemas)
+- Identify opportunities to weave AI-powered features into the product
+- Think about what would make this product genuinely impressive, not just functional
+
+## Output Format
+
+Structure your plan as markdown with these sections:
+
+### Overview
+A 2-3 sentence summary of what we're building and why.
+
+### Features
+A numbered list of features, each with:
+- Feature name
+- One-line description
+- 1-2 user stories in "As a [user], I want [action] so that [benefit]" format
+
+### Design Direction
+- Visual style and tone
+- Key UX principles
+- Reference points or inspirations
+
+### Technical Considerations
+- Recommended tech stack (if relevant)
+- Key architectural decisions
+- Performance or scale considerations
+
+### Success Criteria
+- How we'll know this is done and done well
+- Key metrics to track
+
+Keep the plan concise but comprehensive. Aim for 300-800 words.
diff --git a/backend/src/utils/planner.py b/backend/src/utils/planner.py
@@ -0,0 +1,80 @@
+"""Planner pre-execution phase — expands short prompts into structured product specs."""
+
+from pathlib import Path
+
+from langchain.chat_models import init_chat_model
+from langchain_core.messages import HumanMessage, SystemMessage
+
+from src.schemas.entities.planner import PlannerConfig
+from src.utils.llm import resolve_api_key
+from src.utils.logger import logger
+
+# Load the planner system prompt
+_PLANNER_PROMPT_PATH = Path(__file__).parent.parent / "static" / "prompts" / "md" / "planner.md"
+if not _PLANNER_PROMPT_PATH.exists():
+    raise RuntimeError(f"Planner prompt file not found: {_PLANNER_PROMPT_PATH}")
+_PLANNER_PROMPT = _PLANNER_PROMPT_PATH.read_text()
+
+
+async def run_planner(
+    user_message: str,
+    planner_config: PlannerConfig,
+    default_model: str,
+    api_key: str | None = None,
+    user_keys: dict[str, str] | None = None,
+) -> str:
+    """Run the planner to expand a short prompt into a structured spec.
+
+    Returns the plan as markdown text.
+    """
+    model_name = planner_config.model or default_model
+
+    # Resolve API key for the planner model
+    planner_api_key = api_key
+    if planner_config.model:
+        resolved = resolve_api_key(planner_config.model, user_keys)
+        if resolved:
+            planner_api_key = resolved
+
+    llm = init_chat_model(model_name, api_key=planner_api_key)
+
+    scope_instruction = ""
+    if planner_config.scope_level == "conservative":
+        scope_instruction = (
+            "\n\nIMPORTANT: Keep the scope conservative. "
+            "Only include features explicitly requested by the user. Do not add extras."
+        )
+    elif planner_config.scope_level == "ambitious":
+        scope_instruction = (
+            "\n\nBe ambitious about scope. Include features that would make this product impressive, "
+            "even if the user didn't explicitly request them."
+        )
+    else:
+        scope_instruction = ""
+        logger.warning(f"planner_unknown_scope scope_level={planner_config.scope_level}")
+
+    MAX_PLANNER_INPUT = 10_000  # characters
+    if len(user_message) > MAX_PLANNER_INPUT:
+        logger.warning(f"planner_input_truncated original_length={len(user_message)}")
+        user_message = user_message[:MAX_PLANNER_INPUT]
+
+    system_prompt = _PLANNER_PROMPT + scope_instruction
+
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=f"Please create a product plan for the following request:\n\n{user_message}"),
+    ]
+
+    logger.info(f"planner_phase model={model_name} scope={planner_config.scope_level}")
+
+    try:
+        response = await llm.ainvoke(messages)
+    except Exception as e:
+        logger.error(f"planner_phase_failed model={model_name} error={e}")
+        raise RuntimeError(f"Planner failed to generate plan: {e}") from e
+
+    plan_text = response.content if isinstance(response.content, str) else str(response.content)
+
+    logger.info(f"planner_phase_complete plan_length={len(plan_text)}")
+
+    return plan_text