StoryForge

A Claude Code-based story generation Agent Runtime for computation under uncertainty

Overview

StoryForge is an independent open-source Agent Runtime built on top of Claude Code. It uses short-drama creation as the validation scenario, but the real subject of the project is not “generate a script.” The goal is to turn open-ended, generative, non-unique, human-judged work into a runtime that is controllable, auditable, and capable of convergence-aware iteration.

This repository should be read as a standalone runtime project. The tracked structure, rules, evaluations, and examples define the public project. Ignored local notes, drafts, and private design materials are intentionally outside the public runtime.

Quick Positioning

Dimension	Description
Project type	Claude Code-based Agent Runtime
Core thesis	Harness Engineering for computation under uncertainty
Demonstration scenario	Synopsis, script, and storyboard workflows
Key mechanisms	Context assembly, hard gates, role isolation, review loops, convergence stops, audit trails
Execution model	Claude Code + Markdown-native repo structure

Why Harness Engineering for Computation Under Uncertainty

Many Harness Engineering discussions focus on deterministic computation: compiling, testing, tool orchestration, data transformation, and rule-based validation. Those tasks usually have stable I/O, clearer correctness, and near-binary pass/fail conditions.

StoryForge is aimed at a different class of work: computation under uncertainty.

The output is not a single correct answer but a set of plausible candidates.
Quality is judged through rubrics, reviewer agents, and human approval rather than one assertion.
Revision does not guarantee monotonic improvement; it can stall, diverge, or oscillate.
Human approval is not an afterthought but a first-class runtime control point.
Audit trails are not optional metadata but part of safe handoff and replay.

So the harness here does not treat the model as a black-box function call. It treats generation, review, revision, stop conditions, and archiving as runtime primitives.

System Diagrams

The three diagrams below are reused from the project's internal system design set. Together they show StoryForge from three complementary angles: end-to-end flow, runtime architecture, and the evaluation/revision loop.

Diagram 1: Main Flow

This diagram shows the path from a raw idea to a final output such as a script or storyboard. The red gate marks the mandatory human approval point.

flowchart TD
    A([User Brief]) --> B["Interview / Fill Gaps<br>interviewer.md"]
    B --> C["Build story-pack.md"]
    C --> D["Generate Synopsis<br>synopsis-generator.md"]
    D --> E{"Synopsis Review<br>narrative-reviewer.md<br>logic-auditor.md"}
    E -->|Pass| F
    E -->|Revise| G[Auto Revision]
    G --> D
    F:::hardgate
    F{"Synopsis Approved?<br>→ approved.md"}
    F -->|Approved| H{Choose Output}
    F -->|Feedback| D
    H -->|Script| I["Generate Script<br>script-generator.md"]
    H -->|Storyboard| J["Generate Storyboard<br>storyboard-generator.md"]
    I --> K{"Script Review<br>narrative-reviewer.md<br>character-reviewer.md<br>logic-auditor.md<br>format-checker.md"}
    K -->|Pass| L([Script Complete])
    K -->|Revise| M[Auto Revision]
    M --> I
    J --> N{"Storyboard Review<br>narrative-reviewer.md<br>logic-auditor.md<br>format-checker.md"}
    N -->|Pass| O([Storyboard Complete])
    N -->|Revise| P[Auto Revision]
    P --> J

    classDef hardgate stroke:#e74c3c,stroke-width:3px
    classDef user fill:#3498db,color:#fff
    classDef gen fill:#2ecc71,color:#fff
    classDef eval fill:#f39c12,color:#fff

    class A,F user
    class B,C,D,I,J gen
    class E,K,N eval

Diagram 2: Runtime Architecture

This view emphasizes the runtime layers: Skills orchestrate, Agents execute, Knowledge supports, Rules inject constraints, and Hooks provide reminders after tool events.

flowchart TB
    subgraph Skills[".claude/skills/ — Skills Layer"]
        S1["new-project<br>New Project"]
        S2["evaluate<br>Evaluate Artifact"]
        S3["compare<br>Compare Branches"]
    end

    subgraph GenAgents[".claude/agents/ — Generation Agents"]
        GA1["interviewer.md<br>Interview Agent"]
        GA2["synopsis-generator.md<br>Synopsis Generation"]
        GA3["script-generator.md<br>Script Generation"]
        GA4["storyboard-generator.md<br>Storyboard Generation"]
    end

    subgraph EvalAgents[".claude/agents/ — Evaluation Agents"]
        EA1["narrative-reviewer.md<br>Narrative Review"]
        EA2["character-reviewer.md<br>Character Review"]
        EA3["logic-auditor.md<br>Logic Audit"]
        EA4["format-checker.md<br>Format Check"]
    end

    subgraph Knowledge["knowledge/ — Knowledge Base"]
        K1["genres/<br>Genre Knowledge x4"]
        K2["structure/<br>Structure Models x3"]
        K3["evaluation/<br>Evaluation Rubrics x4"]
        K4["style-guide/<br>Style Guides x3"]
        K5["templates/<br>Output Templates x3"]
    end

    subgraph Rules[".claude/rules/ — Rules Layer"]
        R1["script-writing.md<br>Script Rules → script/**"]
        R2["storyboard-writing.md<br>Storyboard Rules → storyboard/**"]
    end

    subgraph Hook[".claude/settings.json — Hook Layer"]
        H1["PostToolUse: Write<br>Remind to run /evaluate"]
    end

    Skills -->|orchestrates| GenAgents
    Skills -->|orchestrates| EvalAgents
    GenAgents -->|references| Knowledge
    EvalAgents -->|uses| Knowledge
    Rules -.->|path-matched injection| GenAgents
    Hook -.->|reminder after Write| GenAgents

    classDef skill fill:#9b59b6,color:#fff
    classDef gen fill:#2ecc71,color:#fff
    classDef eval fill:#f39c12,color:#fff
    classDef know fill:#3498db,color:#fff
    classDef rule fill:#e74c3c,color:#fff
    classDef hook fill:#95a5a6,color:#fff

    class S1,S2,S3 skill
    class GA1,GA2,GA3,GA4 gen
    class EA1,EA2,EA3,EA4 eval
    class K1,K2,K3,K4,K5 know
    class R1,R2 rule
    class H1 hook

Diagram 3: Evaluation and Auto-Revision Loop

This view focuses on what happens after generation. The runtime scores artifacts, detects convergence or stagnation, and stops rather than revising forever.

flowchart TD
    A["Generate Artifact<br>synopsis/v1.md or ep01.md"] --> B["Evaluator Scoring<br>narrative-reviewer.md<br>character-reviewer.md<br>logic-auditor.md<br>format-checker.md"]
    B --> C{"All dimensions >= 3/5?<br>Based on: synopsis-rubric.md<br>or script-rubric.md"}
    C -->|Yes| D(["Pass<br>→ *-eval.md"])
    C -->|No| E{"Check score trend<br>CLAUDE.md stop policy"}
    E -->|"Converging: S2 > S1"| F{"Over hard cap?<br>Synopsis 3 rounds / Script 5 rounds"}
    E -->|"Stagnation: S2 ≈ S1"| G(["Stop and hand off"])
    E -->|"Diverging: S2 < S1"| H(["Stop immediately"])
    E -->|"Oscillation: mixed up/down"| I(["Stop and hand off"])
    F -->|Below cap| J["Targeted Revision<br>→ v2.md / v3.md"]
    F -->|Cap reached| K(["Force stop"])
    J --> B

    classDef pass fill:#2ecc71,color:#fff
    classDef stop fill:#e74c3c,color:#fff
    classDef check fill:#f39c12,color:#fff

    class D pass
    class G,H,I,K stop
    class C,E,F check

This Is Not a One-Click Generator

The project is not about writing a longer prompt. It is about making the runtime control structure explicit.

synopsis/approved.md is a hard gate; no approved synopsis means no script or storyboard generation
generator and evaluator agents operate under separated contexts
synopsis and script loops have bounded revision budgets and convergence-aware stop conditions
every major action is logged into changelog.md
artifacts and evaluation reports stay inside the project directory for replay and takeover

Why Claude Code Runtime Matters

StoryForge is not a conventional web app and not a standalone backend service. Its assumed runtime host is Claude Code.

In that model:

CLAUDE.md defines global workflow constraints and hard gates
.claude/skills/ exposes structured entry points such as /new-project, /evaluate, and /compare
.claude/agents/ defines generation and review roles
.claude/rules/ injects domain constraints through path matching
projects/* acts as transparent runtime state and artifact storage

Put simply, Claude Code is both the agent host and the operating interface for this runtime.

Core Components

Component	Role	Path
Skills	High-level workflow entry points	`.claude/skills/`
Agents	Generation, review, and audit roles	`.claude/agents/`
Rules	Hard constraints injected by path	`.claude/rules/`
Knowledge	Genre, structure, template, and rubric references	`knowledge/`
Projects	Runtime state, intermediate artifacts, final outputs	`projects/`
Evals	Evaluation cases, baselines, and regression runs	`evals/`

Current Capabilities

interview and normalize a creative brief into story-pack.md
generate a synopsis before any downstream output
branch into script or storyboard generation after approval
run rubric-based review with independent evaluator agents
revise with convergence-aware stopping logic
preserve changelogs, eval reports, and project artifacts in Markdown

Repository Status

Based on evals/baseline.md:

8 agent definitions completed
3 skills integrated
18 knowledge files loaded into runtime context
6/6 evaluation cases passing

You can inspect example projects directly:

Quick Start

Clone the repository and open it in Claude Code.
Use /new-project with a creative brief.
Review the generated story-pack.md and synopsis/v1.md.
Run /evaluate on artifacts and inspect the revision loop.
Approve the synopsis before continuing to script or storyboard generation.

Who This Is For

developers exploring Claude Code as a multi-agent runtime host
teams building controllable workflows for content generation
researchers interested in Harness Engineering for computation under uncertainty
anyone who wants a Markdown-native, auditable, replayable Agent project example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StoryForge

Overview

Quick Positioning

Why Harness Engineering for Computation Under Uncertainty

System Diagrams

Diagram 1: Main Flow

Diagram 2: Runtime Architecture

Diagram 3: Evaluation and Auto-Revision Loop

This Is Not a One-Click Generator

Why Claude Code Runtime Matters

Core Components

Current Capabilities

Repository Status

Quick Start

Who This Is For

FilesExpand file tree

README.en.md

Latest commit

History

README.en.md

File metadata and controls

StoryForge

Overview

Quick Positioning

Why Harness Engineering for Computation Under Uncertainty

System Diagrams

Diagram 1: Main Flow

Diagram 2: Runtime Architecture

Diagram 3: Evaluation and Auto-Revision Loop

This Is Not a One-Click Generator

Why Claude Code Runtime Matters

Core Components

Current Capabilities

Repository Status

Quick Start

Who This Is For