Skip to content

Agent Farm Setup

Ankur Nair edited this page Apr 19, 2026 · 1 revision

Agent Farm Setup

Agent Farm mode turns enrolled slaves into distributed compute nodes. Master's team hires remote teammates; each teammate's turns execute on a different machine via signed agent.execute envelopes. Results stream back through the same mailbox as local teammates.

Agent Farm mode


When to use Farm mode

  • You have multiple machines and want to pool compute
  • You want to isolate risky agent work on a dedicated device (sandboxed with different filesystem access)
  • You have ACP CLI subscriptions tied to specific devices
  • You want to distribute cost across multiple provider accounts
  • You're building a team with heterogeneous requirements (one teammate needs GPU, another doesn't)

Prerequisites

  1. Master is set up (see Master Setup Guide)
  2. At least one slave is enrolled in farm role. The second titlebar icon on a slave lets you pick Workforce or Farm — pick Farm.
  3. The slave has ACP runtimes detected (the runtimes you want to farm out — Claude Code, Gemini, etc.)
  4. Farm slave is reachable from master (normal fleet connectivity)

Hiring a farm teammate

Step 1 — Create a team on master

Regular team creation flow. The team lives on master; farm teammates will be added remotely.

Step 2 — Hire a teammate with Farm backend

  1. Team → + Hire Teammate
  2. Pick a template from the gallery (or Custom)
  3. In the Backend step:
    • Local (default) — runs on master's machine
    • Farm — pick a farm-role device from the dropdown
  4. For Farm: pick the Runtime (slave's detected runtimes are listed; each shows "on device" if confirmed available)
  5. Set the Tool allowlist — master-enforced. The farm slave will refuse any tool call not in this list.
  6. Set the Timeout — default 120s per turn. Farm teammates are subject to network latency; plan for 2-3x local timing.
  7. Hire

Step 3 — Slave auto-materializes the mirror team

The moment you click Hire, master fires team.farm_provision to the selected slave. Slave:

  1. Creates a mirror team with the same teamId
  2. Creates a local Lead ACP session (cached, idle timeout 30min)
  3. Creates a farm teammate slot bound to the remote slot ID
  4. Shows the team in the slave's Teams UI with a badge: Mirror of master's farm slot · read-only

The slave operator sees the team appear in real-time. They can view the live messages + activity but can't interact (the farm team is master-orchestrated).


What happens each turn

Master side

  1. Master's Lead decides to delegate to the farm teammate (via regular mailbox)
  2. Master builds an agent.execute envelope:
    {
      "jobId": "<uuid>",
      "slotId": "<remote slot ID>",
      "messages": [...],
      "model": "claude-sonnet-3.5",
      "temperature": 0.7,
      "toolsAllowlist": ["mcp.team.send_message", "mcp.team.update_task"],
      "timeoutMs": 120000
    }
  3. Master signs with Ed25519, queues in fleet_commands table
  4. Slave polls, picks up the command

Slave side

  1. Farm executor verifies envelope signature + replay nonce
  2. Routes the turn through its cached Lead ACP session (or spawns if not cached)
  3. Streams the output back as a multipart response to master
  4. On turn completion, master's Lead sees the farm teammate's reply in the team mailbox — identical shape to a local teammate's reply

Hybrid teams

A team can mix local + farm teammates freely. From the Lead's perspective, it's one team with one mailbox. The farm distinction is invisible to the coordination logic.


Tool allowlists (critical for farm)

When hiring a farm teammate, you set toolsAllowlist[]. The slave's farm executor enforces this list:

  • Tools in the list: allowed
  • Tools not in the list: blocked with policy_denied ack reason

Recommended allowlists by role:

Backend engineer on farm:

[
  "mcp.filesystem.read:/Users/*/projects/**",
  "mcp.filesystem.write:/Users/*/projects/**",
  "mcp.team.send_message",
  "mcp.team.update_task",
  "mcp.shell.exec"
]

Research analyst on farm:

[
  "mcp.web.fetch",
  "mcp.team.send_message",
  "mcp.team.update_task"
]

Start narrow and widen as needed. Every denied call shows in the activity log on both master + slave sides.


Per-device capacity

Each farm slave has implicit capacity:

  • Concurrent turns — bounded by slave's hardware + ACP runtime concurrency (typically 1-2 concurrent agent turns per device for smooth UX)
  • Memory — each cached ACP session uses ~300-500 MB
  • Cost — farm slave's ACP runtime auth determines who pays (the subscription is the slave's, though the master can mandate a specific account via the slave's managed config)

Master's farm picker shows each device's concurrent jobs in flight so you can balance.


Handling slave outages

If the slave goes offline mid-turn:

  • Master's timeout fires (default 120s after dispatch)
  • Ack comes back as timeout (or no ack at all after grace)
  • Master's team sees a failure message in the farm teammate's mailbox
  • Lead can re-delegate or assign to a different teammate

Operational guidance: set timeouts generously (2-3x the expected turn duration). Farm slaves under load + network latency add variance.


Auto-resume on farm

Master-side auto-resume works: on master restart, Leads wake + re-read the board. Tasks owned by farm teammates continue as normal.

Slave-side: if slave is offline when master's Lead wakes + tries to delegate, the command queues in master's DB. When slave comes back online, it polls + picks up the queued command.


Observability

Master side

Fleet → Dashboard → [device]: shows farm stats (jobs completed, failed, timed out, avg latency).

Slave side

Slave's own Observability → Activity Log shows fleet.command.executed for each agent.execute it received.

Latency tracking

Team → Settings → Farm stats per team shows per-teammate latency distribution. Outliers (p95/p99) surface network + runtime issues.


Common issues

See Troubleshooting#farm-teammate-gives-no-response for the full list.

Frequent ones:

  • runtime_unavailable — slave doesn't have the picked runtime. Install it on slave + re-detect.
  • policy_denied — tool allowlist blocks a call the agent wants to make. Expand the allowlist.
  • runtime_timeout — turn exceeded timeoutMs. Increase on master's team settings.
  • Slave's mirror team shows "Conversation not found" — slave didn't materialize the mirror properly. Manually un-hire + re-hire from master.

Security considerations

  • All envelopes are Ed25519-signed end-to-end; no unauthenticated turn can execute
  • Replay nonces prevent duplicate execution
  • Tool allowlists cap blast radius even if an agent goes off-script
  • Admin re-auth is not required for agent.execute (performance); it's required for destructive commands only

The threat model: master is trusted. Farm slaves are semi-trusted. If a slave is compromised, the attacker has access to the slave's own workspace + the JWT (which only grants farm-command participation; no access to master's config bundle push authority).


Related pages

Clone this wiki locally