A Claude Code-based story generation Agent Runtime for computation under uncertainty
StoryForge is an independent open-source Agent Runtime built on top of Claude Code. It uses short-drama creation as the validation scenario, but the real subject of the project is not “generate a script.” The goal is to turn open-ended, generative, non-unique, human-judged work into a runtime that is controllable, auditable, and capable of convergence-aware iteration.
This repository should be read as a standalone runtime project. The tracked structure, rules, evaluations, and examples define the public project. Ignored local notes, drafts, and private design materials are intentionally outside the public runtime.
| Dimension | Description |
|---|---|
| Project type | Claude Code-based Agent Runtime |
| Core thesis | Harness Engineering for computation under uncertainty |
| Demonstration scenario | Synopsis, script, and storyboard workflows |
| Key mechanisms | Context assembly, hard gates, role isolation, review loops, convergence stops, audit trails |
| Execution model | Claude Code + Markdown-native repo structure |
Many Harness Engineering discussions focus on deterministic computation: compiling, testing, tool orchestration, data transformation, and rule-based validation. Those tasks usually have stable I/O, clearer correctness, and near-binary pass/fail conditions.
StoryForge is aimed at a different class of work: computation under uncertainty.
- The output is not a single correct answer but a set of plausible candidates.
- Quality is judged through rubrics, reviewer agents, and human approval rather than one assertion.
- Revision does not guarantee monotonic improvement; it can stall, diverge, or oscillate.
- Human approval is not an afterthought but a first-class runtime control point.
- Audit trails are not optional metadata but part of safe handoff and replay.
So the harness here does not treat the model as a black-box function call. It treats generation, review, revision, stop conditions, and archiving as runtime primitives.
The three diagrams below are reused from the project's internal system design set. Together they show StoryForge from three complementary angles: end-to-end flow, runtime architecture, and the evaluation/revision loop.
This diagram shows the path from a raw idea to a final output such as a script or storyboard. The red gate marks the mandatory human approval point.
flowchart TD
A([User Brief]) --> B["Interview / Fill Gaps<br>interviewer.md"]
B --> C["Build story-pack.md"]
C --> D["Generate Synopsis<br>synopsis-generator.md"]
D --> E{"Synopsis Review<br>narrative-reviewer.md<br>logic-auditor.md"}
E -->|Pass| F
E -->|Revise| G[Auto Revision]
G --> D
F:::hardgate
F{"Synopsis Approved?<br>→ approved.md"}
F -->|Approved| H{Choose Output}
F -->|Feedback| D
H -->|Script| I["Generate Script<br>script-generator.md"]
H -->|Storyboard| J["Generate Storyboard<br>storyboard-generator.md"]
I --> K{"Script Review<br>narrative-reviewer.md<br>character-reviewer.md<br>logic-auditor.md<br>format-checker.md"}
K -->|Pass| L([Script Complete])
K -->|Revise| M[Auto Revision]
M --> I
J --> N{"Storyboard Review<br>narrative-reviewer.md<br>logic-auditor.md<br>format-checker.md"}
N -->|Pass| O([Storyboard Complete])
N -->|Revise| P[Auto Revision]
P --> J
classDef hardgate stroke:#e74c3c,stroke-width:3px
classDef user fill:#3498db,color:#fff
classDef gen fill:#2ecc71,color:#fff
classDef eval fill:#f39c12,color:#fff
class A,F user
class B,C,D,I,J gen
class E,K,N eval
This view emphasizes the runtime layers: Skills orchestrate, Agents execute, Knowledge supports, Rules inject constraints, and Hooks provide reminders after tool events.
flowchart TB
subgraph Skills[".claude/skills/ — Skills Layer"]
S1["new-project<br>New Project"]
S2["evaluate<br>Evaluate Artifact"]
S3["compare<br>Compare Branches"]
end
subgraph GenAgents[".claude/agents/ — Generation Agents"]
GA1["interviewer.md<br>Interview Agent"]
GA2["synopsis-generator.md<br>Synopsis Generation"]
GA3["script-generator.md<br>Script Generation"]
GA4["storyboard-generator.md<br>Storyboard Generation"]
end
subgraph EvalAgents[".claude/agents/ — Evaluation Agents"]
EA1["narrative-reviewer.md<br>Narrative Review"]
EA2["character-reviewer.md<br>Character Review"]
EA3["logic-auditor.md<br>Logic Audit"]
EA4["format-checker.md<br>Format Check"]
end
subgraph Knowledge["knowledge/ — Knowledge Base"]
K1["genres/<br>Genre Knowledge x4"]
K2["structure/<br>Structure Models x3"]
K3["evaluation/<br>Evaluation Rubrics x4"]
K4["style-guide/<br>Style Guides x3"]
K5["templates/<br>Output Templates x3"]
end
subgraph Rules[".claude/rules/ — Rules Layer"]
R1["script-writing.md<br>Script Rules → script/**"]
R2["storyboard-writing.md<br>Storyboard Rules → storyboard/**"]
end
subgraph Hook[".claude/settings.json — Hook Layer"]
H1["PostToolUse: Write<br>Remind to run /evaluate"]
end
Skills -->|orchestrates| GenAgents
Skills -->|orchestrates| EvalAgents
GenAgents -->|references| Knowledge
EvalAgents -->|uses| Knowledge
Rules -.->|path-matched injection| GenAgents
Hook -.->|reminder after Write| GenAgents
classDef skill fill:#9b59b6,color:#fff
classDef gen fill:#2ecc71,color:#fff
classDef eval fill:#f39c12,color:#fff
classDef know fill:#3498db,color:#fff
classDef rule fill:#e74c3c,color:#fff
classDef hook fill:#95a5a6,color:#fff
class S1,S2,S3 skill
class GA1,GA2,GA3,GA4 gen
class EA1,EA2,EA3,EA4 eval
class K1,K2,K3,K4,K5 know
class R1,R2 rule
class H1 hook
This view focuses on what happens after generation. The runtime scores artifacts, detects convergence or stagnation, and stops rather than revising forever.
flowchart TD
A["Generate Artifact<br>synopsis/v1.md or ep01.md"] --> B["Evaluator Scoring<br>narrative-reviewer.md<br>character-reviewer.md<br>logic-auditor.md<br>format-checker.md"]
B --> C{"All dimensions >= 3/5?<br>Based on: synopsis-rubric.md<br>or script-rubric.md"}
C -->|Yes| D(["Pass<br>→ *-eval.md"])
C -->|No| E{"Check score trend<br>CLAUDE.md stop policy"}
E -->|"Converging: S2 > S1"| F{"Over hard cap?<br>Synopsis 3 rounds / Script 5 rounds"}
E -->|"Stagnation: S2 ≈ S1"| G(["Stop and hand off"])
E -->|"Diverging: S2 < S1"| H(["Stop immediately"])
E -->|"Oscillation: mixed up/down"| I(["Stop and hand off"])
F -->|Below cap| J["Targeted Revision<br>→ v2.md / v3.md"]
F -->|Cap reached| K(["Force stop"])
J --> B
classDef pass fill:#2ecc71,color:#fff
classDef stop fill:#e74c3c,color:#fff
classDef check fill:#f39c12,color:#fff
class D pass
class G,H,I,K stop
class C,E,F check
The project is not about writing a longer prompt. It is about making the runtime control structure explicit.
synopsis/approved.mdis a hard gate; no approved synopsis means no script or storyboard generation- generator and evaluator agents operate under separated contexts
- synopsis and script loops have bounded revision budgets and convergence-aware stop conditions
- every major action is logged into
changelog.md - artifacts and evaluation reports stay inside the project directory for replay and takeover
StoryForge is not a conventional web app and not a standalone backend service. Its assumed runtime host is Claude Code.
In that model:
- CLAUDE.md defines global workflow constraints and hard gates
.claude/skills/exposes structured entry points such as/new-project,/evaluate, and/compare.claude/agents/defines generation and review roles.claude/rules/injects domain constraints through path matchingprojects/*acts as transparent runtime state and artifact storage
Put simply, Claude Code is both the agent host and the operating interface for this runtime.
| Component | Role | Path |
|---|---|---|
| Skills | High-level workflow entry points | .claude/skills/ |
| Agents | Generation, review, and audit roles | .claude/agents/ |
| Rules | Hard constraints injected by path | .claude/rules/ |
| Knowledge | Genre, structure, template, and rubric references | knowledge/ |
| Projects | Runtime state, intermediate artifacts, final outputs | projects/ |
| Evals | Evaluation cases, baselines, and regression runs | evals/ |
- interview and normalize a creative brief into
story-pack.md - generate a synopsis before any downstream output
- branch into script or storyboard generation after approval
- run rubric-based review with independent evaluator agents
- revise with convergence-aware stopping logic
- preserve changelogs, eval reports, and project artifacts in Markdown
Based on evals/baseline.md:
- 8 agent definitions completed
- 3 skills integrated
- 18 knowledge files loaded into runtime context
- 6/6 evaluation cases passing
You can inspect example projects directly:
- Clone the repository and open it in Claude Code.
- Use
/new-projectwith a creative brief. - Review the generated
story-pack.mdandsynopsis/v1.md. - Run
/evaluateon artifacts and inspect the revision loop. - Approve the synopsis before continuing to script or storyboard generation.
- developers exploring Claude Code as a multi-agent runtime host
- teams building controllable workflows for content generation
- researchers interested in Harness Engineering for computation under uncertainty
- anyone who wants a Markdown-native, auditable, replayable Agent project example