Skip to content

mcdays94/twelve-angry-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

189 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twelve-angry-agents — a multi-agent coding platform on Cloudflare

twelve-angry-agents

🚧 Playground / experimentation repo

Built in a single overnight session + a follow-up morning to learn the Cloudflare developer platform hands-on. Things will break. Container leaks happen. Sandboxes time out. Wrangler OAuth expires. The retry loop hasn't been runtime-stress-tested. The Claude Code / Pi harnesses haven't been exercised against a real model. Costs are unbounded. Sandbox network egress isn't restricted. There's no auth. Anyone with the URL can spend your tokens.

Not a security-audited multi-agent platform. Don't run it on a real Cloudflare account or against a real codebase without reading the code first.

No warranty, no support guarantees, no production claims. I'm using this to learn the stack and to have a polished portfolio piece. If you want a production multi-agent coding system, look at Devin, Cursor's background agents, Goose, or any of the other much-more- mature systems in this space.

If you fork this and run it: you're on your own. The good news is that running locally on wrangler dev is fairly contained — sandboxes are Docker containers; nothing is deployed unless you explicitly ask.

A multi-agent coding platform on Cloudflare. Define a team of AI agents in markdown, give them a task, watch them collaborate live in their own sandboxes, and review the full replay afterwards.

PLANNING → IMPLEMENTING ⇄ REVIEWING → DONE
              ↑________________│ (bounded retries on REJECTED)

Each role runs in its own Cloudflare Sandbox container, with its own harness (Claude Code / OpenCode / Pi), pushing commits to a per-Run Cloudflare Artifacts repository. The whole pipeline streams live over WebSocket; status, transitions, and verdict appear in real time in a Vite + React + Tailwind UI styled per DESIGN.md.

Status: Phases 1–6 of plans/v1.md are implemented and committed. Happy-path Run (planner → implementer → reviewer → APPROVED) verified end-to-end (run-20260507-072726-tond, 215s, 235 events captured by RecorderDO). The retry loop and the Claude Code / Pi adapters are wired but haven't been runtime-tested against a real model. Phase 7 (polish) is in progress; remaining items are tracked in BACKLOG.md. See PRD.md for the full design.


Architecture

                         ┌─────────────────────────────────┐
                         │  Browser (Vite/React/Tailwind)  │
                         │   - Pick team + task            │
                         │   - Live WS event stream        │
                         └────────────────┬────────────────┘
                                          │ HTTPS + WSS
                                          ▼
┌──────────────────────────────────────────────────────────────────┐
│  Worker                                                          │
│   - Static assets (the SPA)                                      │
│   - HTTP API: /api/teams, /api/tasks, /api/runs[/:id[/...]]      │
│   - WebSocket upgrade /api/runs/:id/ws → CoordinatorDO           │
└────┬───────────────────────────────────────┬─────────────────────┘
     │                                       │
     ▼                                       ▼
┌──────────────────────┐        ┌──────────────────────────────────┐
│  CoordinatorDO       │        │  Artifacts (per-Run repo)        │
│  (one per Run)       │        │   - One repo per Run             │
│   - FSM driver       │        │   - Authenticated git remote     │
│   - WS hibernation   │        │     handed to each sandbox       │
│   - Storage backlog  │        │   - Final code lives here        │
└────┬─────────────────┘        └──────────────────────────────────┘
     │ spawns 1 sandbox at a time
     ▼
┌──────────────────────────────────────────────────────────────────┐
│  Sandbox (Cloudflare Containers via Sandbox SDK)                 │
│   - Image: cloudflare/sandbox + opencode + claude + pi +         │
│     taa-agent runner                                             │
│   - Env: ARTIFACTS_GIT_REMOTE, AGENT_ROLE, AGENT_HARNESS,        │
│     OPENCODE_AUTH_JSON or ANTHROPIC_API_KEY                      │
│   - Files: /agent/{role,task,feedback}.md                        │
│   - Process: `taa-agent run`                                     │
│       1. Provision auth                                          │
│       2. git clone the Artifacts repo                            │
│       3. Run harness with role + task as prompt                  │
│       4. git push the resulting commit                           │
└──────────────────────────────────────────────────────────────────┘

Built on

  • Cloudflare Workers — the orchestrator
  • Sandbox SDK — one isolated container per agent
  • Artifacts (closed beta) — Git-compatible storage, one repo per Run
  • Durable Objects — Coordinator (per-Run FSM + WebSocket fan-out via the Hibernation API)
  • Containers — under the Sandbox SDK
  • @cloudflare/vite-plugin — runs the Worker in workerd alongside Vite's dev server
  • Vite + React 19 + Tailwind v4 for the live-watch UI
  • portless for https://twelve-angry-agents.localhost URLs

Harnesses

Harness Status Auth Model default
OpenCode Phase 2; verified end-to-end OPENCODE_AUTH_JSON (CF-internal opencode.cloudflare.dev JWT) @cf/moonshotai/kimi-k2.6
Claude Code Phase 5; wired, not yet runtime-tested ANTHROPIC_API_KEY Anthropic default
Pi (@mariozechner/pi-coding-agent) Phase 5; wired, not yet runtime-tested ANTHROPIC_API_KEY (default provider) claude-sonnet-4-5

To switch a role's harness, edit the harness: field in teams/feature-team.md.

Prerequisites

  • macOS or Linux (tested on macOS)
  • Node 22+ and npm 11+
  • Docker / OrbStack / Colima (the Sandbox SDK builds and runs containers locally)
  • Cloudflare account with Artifacts beta enabled — request access at https://forms.gle/DwBoPRa3CWQ8ajFp7
  • portless (brew install portless or npm i -g portless); the proxy must be running before npm run dev

Setup

git clone <this repo>
cd twelve-angry-agents

# Install dependencies (postinstall regenerates Cloudflare types)
npm install

# Authenticate with Cloudflare
npx wrangler login

# Replace the YOUR_CLOUDFLARE_ACCOUNT_ID placeholder in wrangler.jsonc.
# The setup-account script does this automatically by reading
# `wrangler whoami`. (If you want to do it by hand, find your account ID
# in the wrangler whoami table and edit wrangler.jsonc directly.)
npm run setup-account

# Configure secrets:
cp .dev.vars.example .dev.vars
$EDITOR .dev.vars
#   - Required: OPENCODE_AUTH_JSON if any role uses the opencode harness.
#               If you're a CF employee with opencode.cloudflare.dev access:
#                 echo "OPENCODE_AUTH_JSON='$(jq -c . < ~/.local/share/opencode/auth.json)'" >> .dev.vars
#               Otherwise, configure OpenCode against a provider with
#                 `opencode providers` and copy the resulting auth.json.
#   - Required: ANTHROPIC_API_KEY if any role uses the claude-code or pi harness.
#   - Optional but recommended: CF_API_TOKEN_AI_GATEWAY_READ — a Cloudflare
#     API token with `AI Gateway: Read` + `Analytics: Read` scopes on the
#     account that owns CF_AI_GATEWAY_ID. The Orchestrator's `get_costs`
#     tool calls CF's GraphQL analytics endpoint with this token to answer
#     questions like "how much have I spent this week?". Create it at
#     https://dash.cloudflare.com/profile/api-tokens. Without it the
#     tool surfaces a clear `missing_env` error — the rest of the Worker
#     keeps working.

# Make sure portless's proxy daemon is up. First time only:
sudo portless proxy start --https
# Or non-privileged:
portless proxy start --port 1355 --https

Heads up before pushing back to a public fork: if you committed wrangler.jsonc after running setup-account, your account ID is in the diff. Either revert that file before pushing (git checkout wrangler.jsonc) or just don't push. Same goes for .dev.vars — gitignored, but worth a git status sanity check before any push to a public remote.

Worktree bootstrap

When parallel agent workers run in git worktree add-ed worktrees, each fresh worktree needs the same four manual setup steps before npm run dev will start. The bootstrap-worktree script automates them.

# From inside a fresh worktree (NOT the main checkout):
node scripts/bootstrap-worktree.mjs

# Optional flags:
node scripts/bootstrap-worktree.mjs --dry-run      # preview, write nothing
node scripts/bootstrap-worktree.mjs --no-dev-vars  # skip the .dev.vars copy

The script is idempotent. Re-running on an already-bootstrapped worktree is a safe no-op. It refuses to run from the main checkout itself with a non-zero exit. The four steps it performs:

  1. git update-index --skip-worktree wrangler.jsonc, so local edits to wrangler.jsonc cannot pollute commits.
  2. Replace the committed YOUR_CLOUDFLARE_ACCOUNT_ID placeholder in wrangler.jsonc with the operator's real account ID, read from the main checkout's locally-modified wrangler.jsonc.
  3. Inject a local-only dev: block: { "enable_containers": false, "inspector_port": 0 }. The first skips the Docker-dependent Sandbox container build (most worker tasks don't need it); the second dodges port-9229 collisions across parallel worktrees by letting the OS pick a free inspector port.
  4. Copy .dev.vars from the main checkout (it is gitignored, so a fresh git worktree add does not bring it across).

After the script runs, npm install && npm run dev should start cleanly inside the worktree.

When you are done with a worktree, run the inverse before git worktree remove:

node scripts/cleanup-worktree.mjs               # un-skip-worktree, restore wrangler.jsonc, keep .dev.vars
node scripts/cleanup-worktree.mjs --with-dev-vars  # also delete .dev.vars

Tested on macOS / Linux. Cross-platform Windows support is out of scope. So is auto-bootstrapping via a predev hook.

Run it

npm run dev
# Opens (and reverse-proxies) https://twelve-angry-agents.localhost:1355

Click "Start a Run" with the seed feature-team and add-string-utils task; a real LLM will plan, implement, and review a small TypeScript module in a per-Run Artifacts repo. Watch the events stream live in the UI.

A typical Run takes 2–5 minutes (LLM calls + git push + npm install).

Workspaces (artifact-only is a valid mode)

The Run primitive supports three Source variants out of the box (see CONTEXT.md for the full vocabulary):

  • Source of Truth. A GitHub repo registered as a Project. Sync-in clones the default branch into the Workspace; Sync-out pushes a branch and opens a PR on APPROVED.
  • Local Source. A directory on the operator's filesystem uploaded by the CLI (taa run --local .). Sync-in unpacks the tarball into the Workspace; Sync-out serves a diff bundle back via taa apply. No Project, no PAT.
  • Empty Source / artifact-only workspace. No external source. Sync-in starts the Workspace empty; agents create from scratch. History lives entirely in the per-Run Cloudflare Artifacts repo.

The third mode is a valid completion state, not "demo mode". A Run that finishes with its work archived in Cloudflare Artifacts and never pushed to a VCS is fully complete. The legacy feature-team + add-string-utils seed uses this mode for the out-of-the-box first Run, but the same path supports any one-off experiment where the operator wants the platform to produce a Workspace they can read, fork, or publish later (the per-Run Artifacts repo is Git-compatible). ADR-0009 generalises this further as the Cloud Workload Runtime: completion is the Workload reaching a terminal state, and publication to a GitHub PR (or any other Publication Target) is an optional side-effect declared on the Workload Plan.

In the UI:

  • The dashboard's "New Task" picker offers "No project (artifact-only workspace)" as a first-class option.
  • The Live Runs table tags Runs from this path as Artifact-only so an operator can filter by it.
  • The chat surface is the Assistant (the operator-facing label for the Orchestrator from CONTEXT.md, kept distinct from the internal code identifier).

Operator CLI (taa)

A small terminal CLI in taa-cli/ wraps the same /api/* endpoints the SPA uses, so you can drive the system from a shell — handy for scripted runs, CI dispatches, or anyone allergic to GUIs.

# Build the CLI bundle once
npm run taa-cli:build

# Run it directly from the dist/ output
node taa-cli/dist/taa-cli.cjs --help

# Or, after `npm publish` (one-line publish dance documented in taa-cli/README.md):
npx @mcdays94/taa-cli runs list

Highlights:

taa run feature-team add-string-utils       # start a Run, stream events live
taa runs list --limit 10                    # table of recent Runs
taa runs show <run-id>                      # humanized transcript
taa runs show <run-id> --raw | jq           # NDJSON for piping
taa skills list                             # bundled skills

TAA_BASE_URL overrides the API base (default: the npm run dev URL). Self-signed *.localhost certs from portless are auto-trusted; for non-localhost dev, set TAA_INSECURE=1 to opt in. See taa-cli/README.md for the full surface.

Sandbox image

The agent runs inside a Cloudflare Sandbox container built from infra/Dockerfile. For local dev (npm run dev) wrangler builds the image automatically from the Dockerfile each time — no registry needed. The scripts below are for publishing the image to a registry, useful if you're sharing it across machines, baking it into CI, or wiring up a deploy pipeline.

# 1. Build locally (produces `taa-sandbox:dev`)
npm run image:build

# 2. Tag for the registry (produces both <repo>:<short-sha> and <repo>:latest)
npm run image:tag

# 3. Push both tags
npm run image:push

# Or all three in one shot:
npm run image:release

Default registry: ghcr.io/mcdays94/taa-sandbox (GitHub Container Registry, the same account as this repo). Override the target via IMAGE_REPO:

IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:tag
IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:push

Why GHCR over the Cloudflare Containers Registry? GHCR plays nicer with forks (no Cloudflare-account binding), the auth dance is one PAT, and the rate limits are friendly to a personal-tier playground. If you need Cloudflare Containers Registry for production, the override flag above will get you there.

Auth dance (one time)

GHCR needs Docker logged in with a token that has write:packages scope. Easiest path:

# Create a classic PAT at https://github.com/settings/tokens with the
# `write:packages` (and `read:packages`) scope.
echo $GITHUB_TOKEN | docker login ghcr.io -u <your-github-user> --password-stdin

Or use OIDC if you're pushing from GitHub Actions — GHA's built-in ${{ secrets.GITHUB_TOKEN }} already has the right scope, no PAT needed.

When to bump the image

Rebuild + push whenever you change anything that affects the agent's runtime environment:

  • infra/Dockerfile — base image bump, new system packages, harness version pins.
  • taa-agent/src/** — the in-sandbox runner CLI is bundled into the image at build time.
  • taa-agent/build.mjs or its dependencies.

Pure Worker code (src/, web/) does not require an image rebuild — those are deployed by wrangler deploy, which leaves the sandbox image alone.

The dirty-tree marker (<sha>-dirty) on image:tag means a half-finished commit can't accidentally publish as a canonical SHA. Always commit before tagging the canonical build.

CI?

Image builds aren't wired into CI by design. Docker-in-CI on a free GitHub-Actions runner is slow (~3-5 min), and the image only changes when you bump a harness version or system package. The image:release script is fast to run locally before a PR that touches infra/ or taa-agent/. See #5 for the discussion.

Browse a sample Run (no LLM required)

A fully-recorded APPROVED Run is checked in at samples/ — JSON transcript + a guided walkthrough with screenshots. Open samples/README.md to see what a successful planner → implementer → reviewer flow looks like end-to-end without spinning up the LLM stack or burning API credits.

Project layout

.
├── PRD.md                # Full design document
├── plans/
│   ├── v1.md             # 7-phase implementation plan
│   └── spike-harnesses.md
├── teams/                # Team definitions (markdown + YAML frontmatter)
├── tasks/                # Task definitions (markdown + YAML frontmatter)
├── src/                  # Worker code
│   ├── index.ts            # HTTP routing
│   ├── coordinator.ts      # CoordinatorDO — per-Run FSM driver
│   ├── seeds.ts            # Team/Task parser; bundles markdown at build time
│   ├── ids.ts              # Run ID generator
│   ├── types.ts            # Shared types (Worker ↔ SPA)
│   └── env.d.ts            # Cloudflare.Env augmentations for .dev.vars
├── web/                  # Vite + React SPA
│   ├── index.html
│   ├── main.tsx          # Single-file UI with team picker + live event log
│   └── styles.css
├── taa-agent/            # In-sandbox runner CLI
│   ├── src/index.ts        # Subcommands: run, msg, done
│   ├── build.mjs           # esbuild bundle to dist/taa-agent.cjs
│   └── tsconfig.json
├── taa-cli/              # Operator CLI (run management from a terminal)
│   ├── src/                # Commands + lib
│   ├── build.mjs           # esbuild bundle to dist/taa-cli.cjs
│   ├── package.json        # Independently publishable (@mcdays94/taa-cli)
│   ├── README.md
│   └── tsconfig.json
├── infra/
│   └── Dockerfile          # cloudflare/sandbox base + harnesses + taa-agent
├── samples/                # Pre-recorded Run for offline browsing
│   ├── README.md             # Walkthrough w/ screenshots
│   ├── run-*.json            # Full event transcript
│   └── screenshots/
├── test/                   # Vitest tests (Workers pool)
├── wrangler.jsonc          # Worker config — bindings, container, account_id
├── vite.config.ts          # Vite + Cloudflare plugin
└── tsconfig.json

Concepts

Term Meaning
Run One end-to-end execution of a Team against a Task. Has a unique ID, a CoordinatorDO instance, and an Artifacts repo.
Team A markdown file declaring agents, roles, harnesses, and the win condition. Stored in teams/.
Task A markdown file describing what to build. Stored in tasks/.
Agent One participant in a team. Has a role (system prompt) and a harness (which CLI it runs).
Workspace The Artifacts repo for the Run. Agents clone, work, push.
Coordinator The Durable Object that runs the FSM, spawns agents, owns the WS fan-out. One per Run.
Verdict Final state of a Run: APPROVED / REJECTED / MAX_RETRIES / ERROR.
taa The CLI inside agent sandboxes (taa run, taa msg, taa done). Symlink of taa-agent.

Known limitations

  • Sandbox cleanup: each Run leaks one cloudflare/proxy-everything container. Re-running many times will accumulate them. Workaround: docker ps -q --filter "ancestor=cloudflare/proxy-everything:3cb1195" | xargs docker kill.
  • assets.directory must NOT be set in wrangler.jsonc when using the Cloudflare Vite plugin (it's auto-derived from the build output). Setting it silently breaks /api/* routing.
  • OAuth token expires after 1h. Long Runs that span the boundary can break wrangler's remote-bindings session. Refresh with wrangler login or use a long-lived API token.
  • Phase 6 (full transcript replay) is not yet implemented. Currently only structurally important events are persisted — agent.stdout / agent.stderr are live-streamed only.

License

MIT — see LICENSE.

About

A Cloudflare playground for multi-agent coding workflows. Built to learn the Sandbox SDK + Artifacts + DOs + Containers hands-on. Not production.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors