twelve-angry-agents

🚧 Playground / experimentation repo

Built in a single overnight session + a follow-up morning to learn the Cloudflare developer platform hands-on. Things will break. Container leaks happen. Sandboxes time out. Wrangler OAuth expires. The retry loop hasn't been runtime-stress-tested. The Claude Code / Pi harnesses haven't been exercised against a real model. Costs are unbounded. Sandbox network egress isn't restricted. There's no auth. Anyone with the URL can spend your tokens.

Not a security-audited multi-agent platform. Don't run it on a real Cloudflare account or against a real codebase without reading the code first.

No warranty, no support guarantees, no production claims. I'm using this to learn the stack and to have a polished portfolio piece. If you want a production multi-agent coding system, look at Devin, Cursor's background agents, Goose, or any of the other much-more- mature systems in this space.

If you fork this and run it: you're on your own. The good news is that running locally on wrangler dev is fairly contained — sandboxes are Docker containers; nothing is deployed unless you explicitly ask.

A multi-agent coding platform on Cloudflare. Define a team of AI agents in markdown, give them a task, watch them collaborate live in their own sandboxes, and review the full replay afterwards.

PLANNING → IMPLEMENTING ⇄ REVIEWING → DONE
              ↑________________│ (bounded retries on REJECTED)

Each role runs in its own Cloudflare Sandbox container, with its own harness (Claude Code / OpenCode / Pi), pushing commits to a per-Run Cloudflare Artifacts repository. The whole pipeline streams live over WebSocket; status, transitions, and verdict appear in real time in a Vite + React + Tailwind UI styled per DESIGN.md.

Status: Phases 1–6 of plans/v1.md are implemented and committed. Happy-path Run (planner → implementer → reviewer → APPROVED) verified end-to-end (run-20260507-072726-tond, 215s, 235 events captured by RecorderDO). The retry loop and the Claude Code / Pi adapters are wired but haven't been runtime-tested against a real model. Phase 7 (polish) is in progress; remaining items are tracked in BACKLOG.md. See PRD.md for the full design.

Architecture

                         ┌─────────────────────────────────┐
                         │  Browser (Vite/React/Tailwind)  │
                         │   - Pick team + task            │
                         │   - Live WS event stream        │
                         └────────────────┬────────────────┘
                                          │ HTTPS + WSS
                                          ▼
┌──────────────────────────────────────────────────────────────────┐
│  Worker                                                          │
│   - Static assets (the SPA)                                      │
│   - HTTP API: /api/teams, /api/tasks, /api/runs[/:id[/...]]      │
│   - WebSocket upgrade /api/runs/:id/ws → CoordinatorDO           │
└────┬───────────────────────────────────────┬─────────────────────┘
     │                                       │
     ▼                                       ▼
┌──────────────────────┐        ┌──────────────────────────────────┐
│  CoordinatorDO       │        │  Artifacts (per-Run repo)        │
│  (one per Run)       │        │   - One repo per Run             │
│   - FSM driver       │        │   - Authenticated git remote     │
│   - WS hibernation   │        │     handed to each sandbox       │
│   - Storage backlog  │        │   - Final code lives here        │
└────┬─────────────────┘        └──────────────────────────────────┘
     │ spawns 1 sandbox at a time
     ▼
┌──────────────────────────────────────────────────────────────────┐
│  Sandbox (Cloudflare Containers via Sandbox SDK)                 │
│   - Image: cloudflare/sandbox + opencode + claude + pi +         │
│     taa-agent runner                                             │
│   - Env: ARTIFACTS_GIT_REMOTE, AGENT_ROLE, AGENT_HARNESS,        │
│     OPENCODE_AUTH_JSON or ANTHROPIC_API_KEY                      │
│   - Files: /agent/{role,task,feedback}.md                        │
│   - Process: `taa-agent run`                                     │
│       1. Provision auth                                          │
│       2. git clone the Artifacts repo                            │
│       3. Run harness with role + task as prompt                  │
│       4. git push the resulting commit                           │
└──────────────────────────────────────────────────────────────────┘

Built on

Cloudflare Workers — the orchestrator
Sandbox SDK — one isolated container per agent
Artifacts (closed beta) — Git-compatible storage, one repo per Run
Durable Objects — Coordinator (per-Run FSM + WebSocket fan-out via the Hibernation API)
Containers — under the Sandbox SDK
@cloudflare/vite-plugin — runs the Worker in workerd alongside Vite's dev server
Vite + React 19 + Tailwind v4 for the live-watch UI
portless for https://twelve-angry-agents.localhost URLs

Harnesses

Harness	Status	Auth	Model default
OpenCode	Phase 2; verified end-to-end	`OPENCODE_AUTH_JSON` (CF-internal `opencode.cloudflare.dev` JWT)	`@cf/moonshotai/kimi-k2.6`
Claude Code	Phase 5; wired, not yet runtime-tested	`ANTHROPIC_API_KEY`	Anthropic default
Pi (`@mariozechner/pi-coding-agent`)	Phase 5; wired, not yet runtime-tested	`ANTHROPIC_API_KEY` (default provider)	`claude-sonnet-4-5`

To switch a role's harness, edit the harness: field in teams/feature-team.md.

Prerequisites

macOS or Linux (tested on macOS)
Node 22+ and npm 11+
Docker / OrbStack / Colima (the Sandbox SDK builds and runs containers locally)
Cloudflare account with Artifacts beta enabled — request access at https://forms.gle/DwBoPRa3CWQ8ajFp7
portless (brew install portless or npm i -g portless); the proxy must be running before npm run dev

Setup

git clone <this repo>
cd twelve-angry-agents

# Install dependencies (postinstall regenerates Cloudflare types)
npm install

# Authenticate with Cloudflare
npx wrangler login

# Replace the YOUR_CLOUDFLARE_ACCOUNT_ID placeholder in wrangler.jsonc.
# The setup-account script does this automatically by reading
# `wrangler whoami`. (If you want to do it by hand, find your account ID
# in the wrangler whoami table and edit wrangler.jsonc directly.)
npm run setup-account

# Configure secrets:
cp .dev.vars.example .dev.vars
$EDITOR .dev.vars
#   - Required: OPENCODE_AUTH_JSON if any role uses the opencode harness.
#               If you're a CF employee with opencode.cloudflare.dev access:
#                 echo "OPENCODE_AUTH_JSON='$(jq -c . < ~/.local/share/opencode/auth.json)'" >> .dev.vars
#               Otherwise, configure OpenCode against a provider with
#                 `opencode providers` and copy the resulting auth.json.
#   - Required: ANTHROPIC_API_KEY if any role uses the claude-code or pi harness.
#   - Optional but recommended: CF_API_TOKEN_AI_GATEWAY_READ — a Cloudflare
#     API token with `AI Gateway: Read` + `Analytics: Read` scopes on the
#     account that owns CF_AI_GATEWAY_ID. The Orchestrator's `get_costs`
#     tool calls CF's GraphQL analytics endpoint with this token to answer
#     questions like "how much have I spent this week?". Create it at
#     https://dash.cloudflare.com/profile/api-tokens. Without it the
#     tool surfaces a clear `missing_env` error — the rest of the Worker
#     keeps working.

# Make sure portless's proxy daemon is up. First time only:
sudo portless proxy start --https
# Or non-privileged:
portless proxy start --port 1355 --https

Heads up before pushing back to a public fork: if you committed wrangler.jsonc after running setup-account, your account ID is in the diff. Either revert that file before pushing (git checkout wrangler.jsonc) or just don't push. Same goes for .dev.vars — gitignored, but worth a git status sanity check before any push to a public remote.

Worktree bootstrap

When parallel agent workers run in git worktree add-ed worktrees, each fresh worktree needs the same four manual setup steps before npm run dev will start. The bootstrap-worktree script automates them.

# From inside a fresh worktree (NOT the main checkout):
node scripts/bootstrap-worktree.mjs

# Optional flags:
node scripts/bootstrap-worktree.mjs --dry-run      # preview, write nothing
node scripts/bootstrap-worktree.mjs --no-dev-vars  # skip the .dev.vars copy

The script is idempotent. Re-running on an already-bootstrapped worktree is a safe no-op. It refuses to run from the main checkout itself with a non-zero exit. The four steps it performs:

git update-index --skip-worktree wrangler.jsonc, so local edits to wrangler.jsonc cannot pollute commits.
Replace the committed YOUR_CLOUDFLARE_ACCOUNT_ID placeholder in wrangler.jsonc with the operator's real account ID, read from the main checkout's locally-modified wrangler.jsonc.
Inject a local-only dev: block: { "enable_containers": false, "inspector_port": 0 }. The first skips the Docker-dependent Sandbox container build (most worker tasks don't need it); the second dodges port-9229 collisions across parallel worktrees by letting the OS pick a free inspector port.
Copy .dev.vars from the main checkout (it is gitignored, so a fresh git worktree add does not bring it across).

After the script runs, npm install && npm run dev should start cleanly inside the worktree.

When you are done with a worktree, run the inverse before git worktree remove:

node scripts/cleanup-worktree.mjs               # un-skip-worktree, restore wrangler.jsonc, keep .dev.vars
node scripts/cleanup-worktree.mjs --with-dev-vars  # also delete .dev.vars

Tested on macOS / Linux. Cross-platform Windows support is out of scope. So is auto-bootstrapping via a predev hook.

Run it

npm run dev
# Opens (and reverse-proxies) https://twelve-angry-agents.localhost:1355

Click "Start a Run" with the seed feature-team and add-string-utils task; a real LLM will plan, implement, and review a small TypeScript module in a per-Run Artifacts repo. Watch the events stream live in the UI.

A typical Run takes 2–5 minutes (LLM calls + git push + npm install).

Workspaces (artifact-only is a valid mode)

The Run primitive supports three Source variants out of the box (see CONTEXT.md for the full vocabulary):

Source of Truth. A GitHub repo registered as a Project. Sync-in clones the default branch into the Workspace; Sync-out pushes a branch and opens a PR on APPROVED.
Local Source. A directory on the operator's filesystem uploaded by the CLI (taa run --local .). Sync-in unpacks the tarball into the Workspace; Sync-out serves a diff bundle back via taa apply. No Project, no PAT.
Empty Source / artifact-only workspace. No external source. Sync-in starts the Workspace empty; agents create from scratch. History lives entirely in the per-Run Cloudflare Artifacts repo.

The third mode is a valid completion state, not "demo mode". A Run that finishes with its work archived in Cloudflare Artifacts and never pushed to a VCS is fully complete. The legacy feature-team + add-string-utils seed uses this mode for the out-of-the-box first Run, but the same path supports any one-off experiment where the operator wants the platform to produce a Workspace they can read, fork, or publish later (the per-Run Artifacts repo is Git-compatible). ADR-0009 generalises this further as the Cloud Workload Runtime: completion is the Workload reaching a terminal state, and publication to a GitHub PR (or any other Publication Target) is an optional side-effect declared on the Workload Plan.

In the UI:

The dashboard's "New Task" picker offers "No project (artifact-only workspace)" as a first-class option.
The Live Runs table tags Runs from this path as Artifact-only so an operator can filter by it.
The chat surface is the Assistant (the operator-facing label for the Orchestrator from CONTEXT.md, kept distinct from the internal code identifier).

Operator CLI (`taa`)

A small terminal CLI in taa-cli/ wraps the same /api/* endpoints the SPA uses, so you can drive the system from a shell — handy for scripted runs, CI dispatches, or anyone allergic to GUIs.

# Build the CLI bundle once
npm run taa-cli:build

# Run it directly from the dist/ output
node taa-cli/dist/taa-cli.cjs --help

# Or, after `npm publish` (one-line publish dance documented in taa-cli/README.md):
npx @mcdays94/taa-cli runs list

Highlights:

taa run feature-team add-string-utils       # start a Run, stream events live
taa runs list --limit 10                    # table of recent Runs
taa runs show <run-id>                      # humanized transcript
taa runs show <run-id> --raw | jq           # NDJSON for piping
taa skills list                             # bundled skills

TAA_BASE_URL overrides the API base (default: the npm run dev URL). Self-signed *.localhost certs from portless are auto-trusted; for non-localhost dev, set TAA_INSECURE=1 to opt in. See taa-cli/README.md for the full surface.

Sandbox image

The agent runs inside a Cloudflare Sandbox container built from infra/Dockerfile. For local dev (npm run dev) wrangler builds the image automatically from the Dockerfile each time — no registry needed. The scripts below are for publishing the image to a registry, useful if you're sharing it across machines, baking it into CI, or wiring up a deploy pipeline.

# 1. Build locally (produces `taa-sandbox:dev`)
npm run image:build

# 2. Tag for the registry (produces both <repo>:<short-sha> and <repo>:latest)
npm run image:tag

# 3. Push both tags
npm run image:push

# Or all three in one shot:
npm run image:release

Default registry: ghcr.io/mcdays94/taa-sandbox (GitHub Container Registry, the same account as this repo). Override the target via IMAGE_REPO:

IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:tag
IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:push

Why GHCR over the Cloudflare Containers Registry? GHCR plays nicer with forks (no Cloudflare-account binding), the auth dance is one PAT, and the rate limits are friendly to a personal-tier playground. If you need Cloudflare Containers Registry for production, the override flag above will get you there.

Auth dance (one time)

GHCR needs Docker logged in with a token that has write:packages scope. Easiest path:

# Create a classic PAT at https://github.com/settings/tokens with the
# `write:packages` (and `read:packages`) scope.
echo $GITHUB_TOKEN | docker login ghcr.io -u <your-github-user> --password-stdin

Or use OIDC if you're pushing from GitHub Actions — GHA's built-in ${{ secrets.GITHUB_TOKEN }} already has the right scope, no PAT needed.

When to bump the image

Rebuild + push whenever you change anything that affects the agent's runtime environment:

infra/Dockerfile — base image bump, new system packages, harness version pins.
taa-agent/src/** — the in-sandbox runner CLI is bundled into the image at build time.
taa-agent/build.mjs or its dependencies.

Pure Worker code (src/, web/) does not require an image rebuild — those are deployed by wrangler deploy, which leaves the sandbox image alone.

The dirty-tree marker (<sha>-dirty) on image:tag means a half-finished commit can't accidentally publish as a canonical SHA. Always commit before tagging the canonical build.

CI?

Image builds aren't wired into CI by design. Docker-in-CI on a free GitHub-Actions runner is slow (~3-5 min), and the image only changes when you bump a harness version or system package. The image:release script is fast to run locally before a PR that touches infra/ or taa-agent/. See #5 for the discussion.

Browse a sample Run (no LLM required)

A fully-recorded APPROVED Run is checked in at samples/ — JSON transcript + a guided walkthrough with screenshots. Open samples/README.md to see what a successful planner → implementer → reviewer flow looks like end-to-end without spinning up the LLM stack or burning API credits.

Project layout

.
├── PRD.md                # Full design document
├── plans/
│   ├── v1.md             # 7-phase implementation plan
│   └── spike-harnesses.md
├── teams/                # Team definitions (markdown + YAML frontmatter)
├── tasks/                # Task definitions (markdown + YAML frontmatter)
├── src/                  # Worker code
│   ├── index.ts            # HTTP routing
│   ├── coordinator.ts      # CoordinatorDO — per-Run FSM driver
│   ├── seeds.ts            # Team/Task parser; bundles markdown at build time
│   ├── ids.ts              # Run ID generator
│   ├── types.ts            # Shared types (Worker ↔ SPA)
│   └── env.d.ts            # Cloudflare.Env augmentations for .dev.vars
├── web/                  # Vite + React SPA
│   ├── index.html
│   ├── main.tsx          # Single-file UI with team picker + live event log
│   └── styles.css
├── taa-agent/            # In-sandbox runner CLI
│   ├── src/index.ts        # Subcommands: run, msg, done
│   ├── build.mjs           # esbuild bundle to dist/taa-agent.cjs
│   └── tsconfig.json
├── taa-cli/              # Operator CLI (run management from a terminal)
│   ├── src/                # Commands + lib
│   ├── build.mjs           # esbuild bundle to dist/taa-cli.cjs
│   ├── package.json        # Independently publishable (@mcdays94/taa-cli)
│   ├── README.md
│   └── tsconfig.json
├── infra/
│   └── Dockerfile          # cloudflare/sandbox base + harnesses + taa-agent
├── samples/                # Pre-recorded Run for offline browsing
│   ├── README.md             # Walkthrough w/ screenshots
│   ├── run-*.json            # Full event transcript
│   └── screenshots/
├── test/                   # Vitest tests (Workers pool)
├── wrangler.jsonc          # Worker config — bindings, container, account_id
├── vite.config.ts          # Vite + Cloudflare plugin
└── tsconfig.json

Concepts

Term	Meaning
Run	One end-to-end execution of a Team against a Task. Has a unique ID, a CoordinatorDO instance, and an Artifacts repo.
Team	A markdown file declaring agents, roles, harnesses, and the win condition. Stored in `teams/`.
Task	A markdown file describing what to build. Stored in `tasks/`.
Agent	One participant in a team. Has a `role` (system prompt) and a `harness` (which CLI it runs).
Workspace	The Artifacts repo for the Run. Agents clone, work, push.
Coordinator	The Durable Object that runs the FSM, spawns agents, owns the WS fan-out. One per Run.
Verdict	Final state of a Run: `APPROVED` / `REJECTED` / `MAX_RETRIES` / `ERROR`.
`taa`	The CLI inside agent sandboxes (`taa run`, `taa msg`, `taa done`). Symlink of `taa-agent`.

Known limitations

Sandbox cleanup: each Run leaks one cloudflare/proxy-everything container. Re-running many times will accumulate them. Workaround: docker ps -q --filter "ancestor=cloudflare/proxy-everything:3cb1195" | xargs docker kill.
assets.directory must NOT be set in wrangler.jsonc when using the Cloudflare Vite plugin (it's auto-derived from the build output). Setting it silently breaks /api/* routing.
OAuth token expires after 1h. Long Runs that span the boundary can break wrangler's remote-bindings session. Refresh with wrangler login or use a long-lived API token.
Phase 6 (full transcript replay) is not yet implemented. Currently only structurally important events are persisted — agent.stdout / agent.stderr are live-streamed only.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twelve-angry-agents

🚧 Playground / experimentation repo

Architecture

Built on

Harnesses

Prerequisites

Setup

Worktree bootstrap

Run it

Workspaces (artifact-only is a valid mode)

Operator CLI (`taa`)

Sandbox image

Auth dance (one time)

When to bump the image

CI?

Browse a sample Run (no LLM required)

Project layout

Concepts

Known limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
images		images
infra		infra
plans		plans
samples		samples
scripts		scripts
skills/commit-conventions		skills/commit-conventions
src		src
taa-agent		taa-agent
taa-cli		taa-cli
tasks		tasks
teams		teams
test		test
web		web
.dev.vars.example		.dev.vars.example
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.lintstagedrc		.lintstagedrc
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
BACKLOG.md		BACKLOG.md
CONTEXT.md		CONTEXT.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md
ROADMAP.md		ROADMAP.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts
wrangler.jsonc		wrangler.jsonc

Folders and files

Latest commit

History

Repository files navigation

twelve-angry-agents

🚧 Playground / experimentation repo

Architecture

Built on

Harnesses

Prerequisites

Setup

Worktree bootstrap

Run it

Workspaces (artifact-only is a valid mode)

Operator CLI (taa)

Sandbox image

Auth dance (one time)

When to bump the image

CI?

Browse a sample Run (no LLM required)

Project layout

Concepts

Known limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Operator CLI (`taa`)

Packages