Built in a single overnight session + a follow-up morning to learn the Cloudflare developer platform hands-on. Things will break. Container leaks happen. Sandboxes time out. Wrangler OAuth expires. The retry loop hasn't been runtime-stress-tested. The Claude Code / Pi harnesses haven't been exercised against a real model. Costs are unbounded. Sandbox network egress isn't restricted. There's no auth. Anyone with the URL can spend your tokens.
Not a security-audited multi-agent platform. Don't run it on a real Cloudflare account or against a real codebase without reading the code first.
No warranty, no support guarantees, no production claims. I'm using this to learn the stack and to have a polished portfolio piece. If you want a production multi-agent coding system, look at Devin, Cursor's background agents, Goose, or any of the other much-more- mature systems in this space.
If you fork this and run it: you're on your own. The good news is that running locally on
wrangler devis fairly contained — sandboxes are Docker containers; nothing is deployed unless you explicitly ask.
A multi-agent coding platform on Cloudflare. Define a team of AI agents in markdown, give them a task, watch them collaborate live in their own sandboxes, and review the full replay afterwards.
PLANNING → IMPLEMENTING ⇄ REVIEWING → DONE
↑________________│ (bounded retries on REJECTED)
Each role runs in its own Cloudflare Sandbox container, with its own
harness (Claude Code / OpenCode / Pi), pushing commits to a per-Run
Cloudflare Artifacts repository. The whole pipeline streams live over
WebSocket; status, transitions, and verdict appear in real time in a
Vite + React + Tailwind UI styled per DESIGN.md.
Status: Phases 1–6 of
plans/v1.mdare implemented and committed. Happy-path Run (planner → implementer → reviewer → APPROVED) verified end-to-end (run-20260507-072726-tond, 215s, 235 events captured by RecorderDO). The retry loop and the Claude Code / Pi adapters are wired but haven't been runtime-tested against a real model. Phase 7 (polish) is in progress; remaining items are tracked inBACKLOG.md. SeePRD.mdfor the full design.
┌─────────────────────────────────┐
│ Browser (Vite/React/Tailwind) │
│ - Pick team + task │
│ - Live WS event stream │
└────────────────┬────────────────┘
│ HTTPS + WSS
▼
┌──────────────────────────────────────────────────────────────────┐
│ Worker │
│ - Static assets (the SPA) │
│ - HTTP API: /api/teams, /api/tasks, /api/runs[/:id[/...]] │
│ - WebSocket upgrade /api/runs/:id/ws → CoordinatorDO │
└────┬───────────────────────────────────────┬─────────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────────────┐
│ CoordinatorDO │ │ Artifacts (per-Run repo) │
│ (one per Run) │ │ - One repo per Run │
│ - FSM driver │ │ - Authenticated git remote │
│ - WS hibernation │ │ handed to each sandbox │
│ - Storage backlog │ │ - Final code lives here │
└────┬─────────────────┘ └──────────────────────────────────┘
│ spawns 1 sandbox at a time
▼
┌──────────────────────────────────────────────────────────────────┐
│ Sandbox (Cloudflare Containers via Sandbox SDK) │
│ - Image: cloudflare/sandbox + opencode + claude + pi + │
│ taa-agent runner │
│ - Env: ARTIFACTS_GIT_REMOTE, AGENT_ROLE, AGENT_HARNESS, │
│ OPENCODE_AUTH_JSON or ANTHROPIC_API_KEY │
│ - Files: /agent/{role,task,feedback}.md │
│ - Process: `taa-agent run` │
│ 1. Provision auth │
│ 2. git clone the Artifacts repo │
│ 3. Run harness with role + task as prompt │
│ 4. git push the resulting commit │
└──────────────────────────────────────────────────────────────────┘
- Cloudflare Workers — the orchestrator
- Sandbox SDK — one isolated container per agent
- Artifacts (closed beta) — Git-compatible storage, one repo per Run
- Durable Objects — Coordinator (per-Run FSM + WebSocket fan-out via the Hibernation API)
- Containers — under the Sandbox SDK
@cloudflare/vite-plugin— runs the Worker in workerd alongside Vite's dev server- Vite + React 19 + Tailwind v4 for the live-watch UI
- portless for
https://twelve-angry-agents.localhostURLs
| Harness | Status | Auth | Model default |
|---|---|---|---|
| OpenCode | Phase 2; verified end-to-end | OPENCODE_AUTH_JSON (CF-internal opencode.cloudflare.dev JWT) |
@cf/moonshotai/kimi-k2.6 |
| Claude Code | Phase 5; wired, not yet runtime-tested | ANTHROPIC_API_KEY |
Anthropic default |
Pi (@mariozechner/pi-coding-agent) |
Phase 5; wired, not yet runtime-tested | ANTHROPIC_API_KEY (default provider) |
claude-sonnet-4-5 |
To switch a role's harness, edit the harness: field in teams/feature-team.md.
- macOS or Linux (tested on macOS)
- Node 22+ and npm 11+
- Docker / OrbStack / Colima (the Sandbox SDK builds and runs containers locally)
- Cloudflare account with Artifacts beta enabled — request access at https://forms.gle/DwBoPRa3CWQ8ajFp7
- portless (
brew install portlessornpm i -g portless); the proxy must be running beforenpm run dev
git clone <this repo>
cd twelve-angry-agents
# Install dependencies (postinstall regenerates Cloudflare types)
npm install
# Authenticate with Cloudflare
npx wrangler login
# Replace the YOUR_CLOUDFLARE_ACCOUNT_ID placeholder in wrangler.jsonc.
# The setup-account script does this automatically by reading
# `wrangler whoami`. (If you want to do it by hand, find your account ID
# in the wrangler whoami table and edit wrangler.jsonc directly.)
npm run setup-account
# Configure secrets:
cp .dev.vars.example .dev.vars
$EDITOR .dev.vars
# - Required: OPENCODE_AUTH_JSON if any role uses the opencode harness.
# If you're a CF employee with opencode.cloudflare.dev access:
# echo "OPENCODE_AUTH_JSON='$(jq -c . < ~/.local/share/opencode/auth.json)'" >> .dev.vars
# Otherwise, configure OpenCode against a provider with
# `opencode providers` and copy the resulting auth.json.
# - Required: ANTHROPIC_API_KEY if any role uses the claude-code or pi harness.
# - Optional but recommended: CF_API_TOKEN_AI_GATEWAY_READ — a Cloudflare
# API token with `AI Gateway: Read` + `Analytics: Read` scopes on the
# account that owns CF_AI_GATEWAY_ID. The Orchestrator's `get_costs`
# tool calls CF's GraphQL analytics endpoint with this token to answer
# questions like "how much have I spent this week?". Create it at
# https://dash.cloudflare.com/profile/api-tokens. Without it the
# tool surfaces a clear `missing_env` error — the rest of the Worker
# keeps working.
# Make sure portless's proxy daemon is up. First time only:
sudo portless proxy start --https
# Or non-privileged:
portless proxy start --port 1355 --httpsHeads up before pushing back to a public fork: if you committed
wrangler.jsonc after running setup-account, your account ID is in
the diff. Either revert that file before pushing
(git checkout wrangler.jsonc) or just don't push. Same goes for
.dev.vars — gitignored, but worth a git status sanity check
before any push to a public remote.
When parallel agent workers run in git worktree add-ed worktrees, each
fresh worktree needs the same four manual setup steps before
npm run dev will start. The bootstrap-worktree script automates them.
# From inside a fresh worktree (NOT the main checkout):
node scripts/bootstrap-worktree.mjs
# Optional flags:
node scripts/bootstrap-worktree.mjs --dry-run # preview, write nothing
node scripts/bootstrap-worktree.mjs --no-dev-vars # skip the .dev.vars copyThe script is idempotent. Re-running on an already-bootstrapped worktree is a safe no-op. It refuses to run from the main checkout itself with a non-zero exit. The four steps it performs:
git update-index --skip-worktree wrangler.jsonc, so local edits towrangler.jsonccannot pollute commits.- Replace the committed
YOUR_CLOUDFLARE_ACCOUNT_IDplaceholder inwrangler.jsoncwith the operator's real account ID, read from the main checkout's locally-modifiedwrangler.jsonc. - Inject a local-only
dev:block:{ "enable_containers": false, "inspector_port": 0 }. The first skips the Docker-dependent Sandbox container build (most worker tasks don't need it); the second dodges port-9229 collisions across parallel worktrees by letting the OS pick a free inspector port. - Copy
.dev.varsfrom the main checkout (it is gitignored, so a freshgit worktree adddoes not bring it across).
After the script runs, npm install && npm run dev should start
cleanly inside the worktree.
When you are done with a worktree, run the inverse before
git worktree remove:
node scripts/cleanup-worktree.mjs # un-skip-worktree, restore wrangler.jsonc, keep .dev.vars
node scripts/cleanup-worktree.mjs --with-dev-vars # also delete .dev.varsTested on macOS / Linux. Cross-platform Windows support is out of scope. So is auto-bootstrapping via a
predevhook.
npm run dev
# Opens (and reverse-proxies) https://twelve-angry-agents.localhost:1355Click "Start a Run" with the seed feature-team and add-string-utils task; a real LLM will plan, implement, and review a small TypeScript module in a per-Run Artifacts repo. Watch the events stream live in the UI.
A typical Run takes 2–5 minutes (LLM calls + git push + npm install).
The Run primitive supports three Source variants out of the box (see CONTEXT.md for the full vocabulary):
- Source of Truth. A GitHub repo registered as a Project. Sync-in clones the default branch into the Workspace; Sync-out pushes a branch and opens a PR on APPROVED.
- Local Source. A directory on the operator's filesystem uploaded by the CLI (
taa run --local .). Sync-in unpacks the tarball into the Workspace; Sync-out serves a diff bundle back viataa apply. No Project, no PAT. - Empty Source / artifact-only workspace. No external source. Sync-in starts the Workspace empty; agents create from scratch. History lives entirely in the per-Run Cloudflare Artifacts repo.
The third mode is a valid completion state, not "demo mode". A Run that finishes with its work archived in Cloudflare Artifacts and never pushed to a VCS is fully complete. The legacy feature-team + add-string-utils seed uses this mode for the out-of-the-box first Run, but the same path supports any one-off experiment where the operator wants the platform to produce a Workspace they can read, fork, or publish later (the per-Run Artifacts repo is Git-compatible). ADR-0009 generalises this further as the Cloud Workload Runtime: completion is the Workload reaching a terminal state, and publication to a GitHub PR (or any other Publication Target) is an optional side-effect declared on the Workload Plan.
In the UI:
- The dashboard's "New Task" picker offers "No project (artifact-only workspace)" as a first-class option.
- The Live Runs table tags Runs from this path as Artifact-only so an operator can filter by it.
- The chat surface is the Assistant (the operator-facing label for the Orchestrator from CONTEXT.md, kept distinct from the internal code identifier).
A small terminal CLI in taa-cli/ wraps the same /api/* endpoints the SPA uses, so you can drive the system from a shell — handy for scripted runs, CI dispatches, or anyone allergic to GUIs.
# Build the CLI bundle once
npm run taa-cli:build
# Run it directly from the dist/ output
node taa-cli/dist/taa-cli.cjs --help
# Or, after `npm publish` (one-line publish dance documented in taa-cli/README.md):
npx @mcdays94/taa-cli runs listHighlights:
taa run feature-team add-string-utils # start a Run, stream events live
taa runs list --limit 10 # table of recent Runs
taa runs show <run-id> # humanized transcript
taa runs show <run-id> --raw | jq # NDJSON for piping
taa skills list # bundled skillsTAA_BASE_URL overrides the API base (default: the npm run dev URL). Self-signed *.localhost certs from portless are auto-trusted; for non-localhost dev, set TAA_INSECURE=1 to opt in. See taa-cli/README.md for the full surface.
The agent runs inside a Cloudflare Sandbox container built from infra/Dockerfile. For local dev (npm run dev) wrangler builds the image automatically from the Dockerfile each time — no registry needed. The scripts below are for publishing the image to a registry, useful if you're sharing it across machines, baking it into CI, or wiring up a deploy pipeline.
# 1. Build locally (produces `taa-sandbox:dev`)
npm run image:build
# 2. Tag for the registry (produces both <repo>:<short-sha> and <repo>:latest)
npm run image:tag
# 3. Push both tags
npm run image:push
# Or all three in one shot:
npm run image:releaseDefault registry: ghcr.io/mcdays94/taa-sandbox (GitHub Container Registry, the same account as this repo). Override the target via IMAGE_REPO:
IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:tag
IMAGE_REPO=ghcr.io/your-org/taa-sandbox npm run image:pushWhy GHCR over the Cloudflare Containers Registry? GHCR plays nicer with forks (no Cloudflare-account binding), the auth dance is one PAT, and the rate limits are friendly to a personal-tier playground. If you need Cloudflare Containers Registry for production, the override flag above will get you there.
GHCR needs Docker logged in with a token that has write:packages scope. Easiest path:
# Create a classic PAT at https://github.com/settings/tokens with the
# `write:packages` (and `read:packages`) scope.
echo $GITHUB_TOKEN | docker login ghcr.io -u <your-github-user> --password-stdinOr use OIDC if you're pushing from GitHub Actions — GHA's built-in ${{ secrets.GITHUB_TOKEN }} already has the right scope, no PAT needed.
Rebuild + push whenever you change anything that affects the agent's runtime environment:
infra/Dockerfile— base image bump, new system packages, harness version pins.taa-agent/src/**— the in-sandbox runner CLI is bundled into the image at build time.taa-agent/build.mjsor its dependencies.
Pure Worker code (src/, web/) does not require an image rebuild — those are deployed by wrangler deploy, which leaves the sandbox image alone.
The dirty-tree marker (<sha>-dirty) on image:tag means a half-finished commit can't accidentally publish as a canonical SHA. Always commit before tagging the canonical build.
Image builds aren't wired into CI by design. Docker-in-CI on a free GitHub-Actions runner is slow (~3-5 min), and the image only changes when you bump a harness version or system package. The image:release script is fast to run locally before a PR that touches infra/ or taa-agent/. See #5 for the discussion.
A fully-recorded APPROVED Run is checked in at samples/ — JSON transcript + a guided walkthrough with screenshots. Open samples/README.md to see what a successful planner → implementer → reviewer flow looks like end-to-end without spinning up the LLM stack or burning API credits.
.
├── PRD.md # Full design document
├── plans/
│ ├── v1.md # 7-phase implementation plan
│ └── spike-harnesses.md
├── teams/ # Team definitions (markdown + YAML frontmatter)
├── tasks/ # Task definitions (markdown + YAML frontmatter)
├── src/ # Worker code
│ ├── index.ts # HTTP routing
│ ├── coordinator.ts # CoordinatorDO — per-Run FSM driver
│ ├── seeds.ts # Team/Task parser; bundles markdown at build time
│ ├── ids.ts # Run ID generator
│ ├── types.ts # Shared types (Worker ↔ SPA)
│ └── env.d.ts # Cloudflare.Env augmentations for .dev.vars
├── web/ # Vite + React SPA
│ ├── index.html
│ ├── main.tsx # Single-file UI with team picker + live event log
│ └── styles.css
├── taa-agent/ # In-sandbox runner CLI
│ ├── src/index.ts # Subcommands: run, msg, done
│ ├── build.mjs # esbuild bundle to dist/taa-agent.cjs
│ └── tsconfig.json
├── taa-cli/ # Operator CLI (run management from a terminal)
│ ├── src/ # Commands + lib
│ ├── build.mjs # esbuild bundle to dist/taa-cli.cjs
│ ├── package.json # Independently publishable (@mcdays94/taa-cli)
│ ├── README.md
│ └── tsconfig.json
├── infra/
│ └── Dockerfile # cloudflare/sandbox base + harnesses + taa-agent
├── samples/ # Pre-recorded Run for offline browsing
│ ├── README.md # Walkthrough w/ screenshots
│ ├── run-*.json # Full event transcript
│ └── screenshots/
├── test/ # Vitest tests (Workers pool)
├── wrangler.jsonc # Worker config — bindings, container, account_id
├── vite.config.ts # Vite + Cloudflare plugin
└── tsconfig.json
| Term | Meaning |
|---|---|
| Run | One end-to-end execution of a Team against a Task. Has a unique ID, a CoordinatorDO instance, and an Artifacts repo. |
| Team | A markdown file declaring agents, roles, harnesses, and the win condition. Stored in teams/. |
| Task | A markdown file describing what to build. Stored in tasks/. |
| Agent | One participant in a team. Has a role (system prompt) and a harness (which CLI it runs). |
| Workspace | The Artifacts repo for the Run. Agents clone, work, push. |
| Coordinator | The Durable Object that runs the FSM, spawns agents, owns the WS fan-out. One per Run. |
| Verdict | Final state of a Run: APPROVED / REJECTED / MAX_RETRIES / ERROR. |
taa |
The CLI inside agent sandboxes (taa run, taa msg, taa done). Symlink of taa-agent. |
- Sandbox cleanup: each Run leaks one
cloudflare/proxy-everythingcontainer. Re-running many times will accumulate them. Workaround:docker ps -q --filter "ancestor=cloudflare/proxy-everything:3cb1195" | xargs docker kill. assets.directorymust NOT be set inwrangler.jsoncwhen using the Cloudflare Vite plugin (it's auto-derived from the build output). Setting it silently breaks/api/*routing.- OAuth token expires after 1h. Long Runs that span the boundary can break wrangler's remote-bindings session. Refresh with
wrangler loginor use a long-lived API token. - Phase 6 (full transcript replay) is not yet implemented. Currently only structurally important events are persisted —
agent.stdout/agent.stderrare live-streamed only.
MIT — see LICENSE.
