Parent: spec.md Β· Siblings: architecture.md Β· skills-protocol.md Β· new-agent-runtime-acp.md Β· modes.md
The adapter layer is OD's most load-bearing design decision. We delegate the entire agent loop β model calls, tool use, context management, permission handling, resume, cancel β to the user's existing code agent CLI. OD's job is to detect it, feed it a skill + prompt + working directory, and stream its output back to the web UI.
If you're adding a new ACP-backed runtime, start with new-agent-runtime-acp.md for the expected stdio transport, JSON-RPC message flow, and process lifecycle contract.
Thesis: The code agent space has already converged on a few strong implementations (Claude Code, Codex, Devin for Terminal, Cursor Agent, Gemini CLI, OpenCode, OpenClaw, Qoder CLI). Reimplementing another one is worse than just talking to all of them.
Inspiration: multica (PATH-scan detection + daemon architecture) and cc-switch (per-agent config format knowledge + symlink-based skill distribution).
Every adapter implements this interface. The current adapter implementation lives in apps/daemon/src/agents.ts.
interface AgentAdapter {
readonly id: string; // "claude-code" | "codex" | β¦
readonly displayName: string;
// -- discovery --------------------------------------------------
detect(): Promise<AgentDetection | null>; // null if not installed
// -- capability negotiation ------------------------------------
capabilities(): AgentCapabilities;
// -- execution -------------------------------------------------
run(params: AgentRunParams): AsyncIterable<AgentEvent>;
cancel(runId: string): Promise<void>;
resume?(runId: string, message: string): AsyncIterable<AgentEvent>;
}
interface AgentDetection {
executablePath: string; // absolute path to CLI
version: string;
configDir?: string; // e.g. ~/.claude
skillsDir?: string; // e.g. ~/.claude/skills
authState: "ok" | "missing" | "expired";
}
interface AgentCapabilities {
surgicalEdit: boolean; // can edit a targeted region without rewriting file
nativeSkillLoading: boolean; // picks up ~/.<agent>/skills/ automatically
streaming: boolean; // emits tool calls in real time
resume: boolean; // can continue an interrupted run
permissionMode: "strict" | "permissive" | "none";
contextWindowHint?: number; // in tokens
}
interface AgentRunParams {
runId: string;
cwd: string; // absolute path β artifact dir
systemPrompt: string; // skill's SKILL.md body + DESIGN.md excerpt
userPrompt: string;
skillDir?: string; // if set, adapter should make skill files available
allowedTools?: string[]; // for agents that support it
timeoutMs?: number;
}
type AgentEvent =
| { type: "thinking"; text: string }
| { type: "tool_call"; name: string; input: unknown; id: string }
| { type: "tool_result"; id: string; output: unknown }
| { type: "text_delta"; text: string }
| { type: "file_write"; path: string } // synthesized by adapter if agent doesn't emit natively
| { type: "error"; error: string }
| { type: "done"; reason: "completed" | "cancelled" | "error" };Run all adapters' detect() in parallel on daemon start, then cache results in ~/.open-design/agents.json with a 24h TTL. Re-detect on daemon SIGHUP.
Each adapter uses two signals:
- PATH scan.
which <binary>for each known executable name. Fast (<10ms). - Config-dir probe. Check for
~/.claude/,~/.codex/,~/.cursor/, etc. This catches cases where the CLI was installed via npm global into a shell-specific PATH.
If both signals agree, detection is confident. If only one signal fires, we mark authState: "missing" and prompt the user to run the CLI's auth flow.
| Adapter | CLI command | Config dir | Skills dir | Native skill loading | Surgical edit | Streaming | Priority |
|---|---|---|---|---|---|---|---|
| claude-code | claude |
~/.claude/ |
~/.claude/skills/ |
β | β | β | P0 (MVP) |
| api-fallback | (direct Anthropic API) | β | β | β (prompt-injected) | γ | β | P0 (MVP) |
| codex | codex |
~/.codex/ |
~/.codex/skills/ |
γ (varies by version) | γ (regenerate w/ scoping) | β | P1 |
| devin | devin |
~/.config/devin/ |
~/.config/devin/skills/ |
β | β | β
(acp-json-rpc) |
P1 |
| cursor-agent | cursor-agent |
~/.cursor/ |
n/a (via project .cursorrules) |
β (prompt-injected) | β | β | P1 |
| gemini-cli | gemini |
~/.config/gemini/ |
β | β (prompt-injected) | β (regenerate) | β | P2 |
| opencode | opencode |
~/.opencode/ |
γ | γ | β | P2 | |
| openclaw | openclaw |
~/.openclaw/ |
γ | γ | γ | P2 | |
| copilot | copilot |
~/.copilot/ |
β | β
(edit tool) |
β
(--output-format json JSONL) |
P2 | |
| kiro | kiro-cli |
~/.kiro/ |
β | β | β
(acp-json-rpc) |
P2 | |
| kilo | kilo |
β | β | β | β
(acp-json-rpc) |
P2 | |
| vibe | vibe-acp |
~/.vibe/ |
β | β | β
(acp-json-rpc) |
P2 | |
| trae-cli | traecli |
Trae CLI config | Trae CLI managed | β (prompt-injected) | β | β
(acp-json-rpc) |
P2 |
| deepseek | deepseek |
~/.deepseek/ |
~/.deepseek/skills/ |
β (prompt-injected) | β | β (plain text) | P2 |
| qoder | qodercli |
Qoder CLI config | Qoder CLI managed | β (prompt-injected) | β | β
(stream-json) |
P2 |
| pi | pi |
~/.pi/agent/ |
~/.pi/agent/skills/ |
β (prompt-injected) | β | β
(pi-rpc JSON-RPC) |
P2 |
"P0/P1/P2" correspond to the roadmap phases in roadmap.md.
Skills travel into each agent via one of three strategies, in order of preference:
Agent scans its own ~/.<agent>/skills/ on launch. We symlink OD's skill into that dir (see skills-protocol.md Β§3) and let the agent pick it up natively. Zero prompt overhead.
- Works for: Claude Code. Codex (version-dependent). OpenCode.
We read the skill's SKILL.md body + any references/*.md files it has, concatenate them into the system prompt, and copy assets/ files into the cwd. The agent has no concept of "skills" but has the instructions.
- Works for: everyone. Default for API fallback, Cursor Agent, Gemini CLI.
- Cost: more tokens per run. Mitigation: prune
references/to the files the skill body actually mentions.
For agents that support AGENTS.md / .cursorrules / similar project-level instruction files (Cursor Agent, OpenCode), we write a project-scoped instruction file in the artifact cwd before running the agent. The agent picks it up automatically.
- Works for: Cursor Agent (
.cursorrules), some OpenCode configurations.
The adapter declares which strategy to use via capabilities().nativeSkillLoading and a private skillInjectionStrategy field.
- Invocation:
claude --print --output-format stream-json --cwd <artifact-dir> "<prompt>". - Streaming format: JSON Lines over stdout; each line is an event we can map to
AgentEventdirectly. - Skill loading: native. Just ensure the skill is symlinked in
~/.claude/skills/. - Surgical edits: use the
Edittool; Claude Code's own loop handles this. - Permission: set
--allowed-tools "Read,Edit,Write"to restrict blast radius. - Cancel: send
SIGTERM; Claude Code flushes and exits. - Gotchas: Claude Code's JSON stream schema is versioned β pin to a known version, warn on mismatch.
- Invocation: direct Anthropic Messages API with
stream: true. - Skill loading: prompt injection only β read the skill dir, inline everything.
- Tool use: we register
Read/Write/Editas tools, implement them in the daemon against the artifact cwd, and run the loop ourselves. This is the one place OD does own the loop β because the user has no agent at all. Keep it as dumb as possible. - Surgical edits: approximated by regenerating the whole target file with "only change X" in the prompt.
- Model: Claude Sonnet 4.6 default; Opus 4.7 behind a flag.
- Why ship this at all? Topology C requires it (no daemon available in a pure-Vercel deploy). Also, users trying OD for the first time without a CLI installed still get a working experience.
- Invocation:
codex exec --cwd <dir> "<prompt>". - Streaming: line-based; parse with a regex-based state machine. Less rich than Claude Code's JSON stream.
- Skill loading: varies. Newer Codex versions read
~/.codex/skills/; older versions don't. Detect by version string; fall back to prompt injection. - Surgical edits: Codex's edit tool exists but the tool-call schema is different enough that we regenerate files instead in v1. Revisit in v2.
- Gotcha: Codex's CLI auth state can be "authenticated to wrong org." Detect by running
codex whoamiat detect time.
- Invocation:
devin --permission-mode dangerous --respect-workspace-trust false acp. - Install/update: macOS/Linux/WSL users can install with
curl -fsSL https://cli.devin.ai/install.sh | bash; rundevin updatefor existing installs. - Version requirement: requires a Devin CLI build with the
devin acpsubcommand (verified withdevin 2026.5.1-1). Check withdevin acp --help; if the subcommand is missing, update or reinstall Devin for Terminal. - Streaming: Agent Client Protocol JSON-RPC over stdio, handled by the daemon's shared
acp-json-rpctransport. - Skill loading: Devin supports
.devin/skills/and~/.config/devin/skills/; OD's current daemon also prompt-injects the selected skill body into the composed prompt, so no per-project skill install is required for generation. - Surgical edits: Devin's own edit/write tools handle targeted changes.
- Permission:
--permission-mode dangerousavoids headless approval prompts in the web UI;--respect-workspace-trust falseensures Devin doesn't block on trust prompts for newly created project dirs. Org/team-level policies still apply inside Devin.
- Invocation:
cursor-agent --workspace <dir> "<prompt>"(rough; verify with CLI docs at implementation time). - Streaming: yes, JSON lines.
- Skill loading: no native skill concept. We write a
.cursorrulesfile into the artifact dir before running. The rules file contains the skill's SKILL.md body (minus front-matter). - Surgical edits: Cursor's inline edit tool is strong; map our
refinecall to its edit protocol. - Gotcha: Cursor Agent operates on workspaces, not single files. Constrain the workspace to the artifact dir to prevent over-broad changes.
- Invocation:
geminiwith the composed prompt delivered via stdin (no-pflag). Gemini CLI enters headless mode automatically when stdin is a pipe and no-pflag is supplied β verified withgemini@0.1.x. - Trust:
GEMINI_CLI_TRUST_WORKSPACE=trueis set in the spawned process instead of passing--skip-trust, which is version-fragile across Gemini CLI releases. - Streaming: yes,
--output-format stream-jsonto stdout. - Skill loading: prompt injection only.
- Surgical edits: regenerate whole file.
- Gotcha β
spawn ENAMETOOLONGon Windows: Passing the full composed prompt as a-p <string>CLI argument hits Windows'CreateProcesshard limit of ~32 KB for the entire command line. The fix is to setpromptViaStdin: truein the agent definition and write the prompt tochild.stdinafter spawning. The daemon's/api/chathandler checks this flag and opens stdin as a pipe instead of'ignore'. - Gotcha: Gemini's tool-use format is distinct; we translate our file-write tool to its
file_toolequivalent when that feature is implemented.
- Less-matured CLIs. Targeting P2. Expect bumps; adapter implementations will likely be the thinnest possible "shell out, parse output, synthesize events" approach.
- Invocation:
copilot -p "<prompt>" --allow-all-tools --output-format json --add-dir <skills> --add-dir <design-systems>.--allow-all-toolsis mandatory in non-interactive mode β without it the CLI blocks waiting for human approval on every tool call. Unlike Codex (whereexecis a dedicated headless subcommand with auto-approve baked in) or Claude Code (which inherits its permission policy from~/.claude/settings.json), Copilot's-pmode always prompts unless this flag is passed explicitly.--add-dir(repeatable) widens the path-level sandbox so Copilot can read skill seeds and design-system specs that live outside the project cwd. - Streaming:
--output-format jsonemits JSONL with the same expressive shape as Claude Code's stream-json (assistant.reasoning_delta,assistant.message_delta,tool.execution_start/complete,result).apps/daemon/src/copilot-stream.tsmaps these onto the same UI events asclaude-stream.ts. - Skill loading: prompt injection only. Github Copilot's tool catalog includes a
skilltool β native format worth reverse-engineering later. - Surgical edits: dedicated
edittool. - Detection assumes Copilot is already authenticated, via one of:
copilot login(subcommand, OAuth device flow), the interactive/loginslash command insidecopilotwith no args.
- Invocation:
qodercli -p --output-format stream-json --permission-mode bypass_permissions --cwd <dir> [--model <id>] --add-dir <absolute-skills-dir> --add-dir <absolute-design-systems-dir>, with the composed prompt delivered over stdin. Print mode exits after the turn, which fits the daemon's one-request chat lifecycle. Qoder is currently text-only in OD;_imagePathsare intentionally ignored because Qoder CLI does not expose a supported multimodal flag for this adapter path yet. - Streaming:
--output-format stream-jsonemits JSONL records such assystem/init,assistant, andresult.apps/daemon/src/qoder-stream.tsmaps assistant content blocks to text deltas, maps assistant errors without text to typed error events, and preserves result usage, model usage, cost, duration, stop reason, and unknown records as raw events. - Models: ships fallback hints for
default,lite,efficient,auto,performance, andultimate. Selectingdefaultomits--modelso Qoder's own CLI configuration remains authoritative. - Skills: prompt injection only in v1.
--add-diris repeatable so Qoder can read absolute skill and design-system roots that live outside the active project cwd; the daemon does not forward relative extra directory entries. - Permission:
--permission-mode bypass_permissionsavoids headless approval prompts in the web UI. Users should treat this as the same trust posture as running Qoder directly with that flag in the selected project directory. - Gotcha: Detection only proves
qodercli --versioncan run. Qoder authentication and account scope remain owned by Qoder CLI, with credentials read from Qoder's~/.qoder/config.json; the daemon surfaces stderr/stdout failures from the spawned run instead of running login or editing Qoder config.
- Invocation:
traecli acp serve, using the daemon's shared ACP JSON-RPC transport. The adapter follows Trae CLI's public ACP entrypoint documented at https://www.volcengine.com/docs/86677/2227861?lang=zh. - Streaming:
acp-json-rpc; the daemon uses the same ACP event path as the other ACP-backed adapters. - Models: dynamic via the ACP handshake. If model discovery fails, the picker falls back to the CLI's default configuration rather than requiring CI or startup detection to log in to Trae CLI.
- Skills: prompt injection only in v1. External MCP servers can be forwarded through the ACP launch descriptor with the existing
acp-mergepath. - Gotcha: Detection only proves
traecli --versionand model discovery can run in the current environment. Trae CLI owns login, account scope, and model entitlement; the daemon does not run login flows or edit Trae CLI configuration.
- Invocation:
pi --mode rpc [--model <id>] [--thinking <level>] [--append-system-prompt <dir> β¦], with the composed prompt delivered over stdin via JSON-RPC. The daemon sends apromptcommand (optionally withimagesfor multimodal input) and pi streams back typed events untilagent_end. Pi's RPC process stays alive afteragent_end(designed for multi-prompt sessions); the daemon closes stdin and SIGTERMs after a grace period since/api/chatis single-shot. - Streaming:
pi-rpcJSON-RPC over stdio. Events includeagent_start,turn_start/end,message_update(text deltas, thinking deltas, tool calls),tool_execution_start/end,compaction_start,auto_retry_start/end,extension_error.apps/daemon/src/pi-rpc.tsmaps these onto the same UI event set asclaude-stream.js/copilot-stream.js/acp.js. Error events fromextension_errorand exhaustedauto_retry_endare routed throughsendAgentEventso the daemon's empty-output guard andagentStreamErrorflag apply (same path as qoder-stream-json and json-event-stream after issue #691). - Models: dynamic β
pi --list-modelsprints a TSV table to stderr that the daemon parses into provider/model picker entries. Fallback hints for the most common providers/models are shipped for when the list command times out. - Images: pi's RPC
promptcommand supports animagesfield (base64-encodedImageContentobjects). The daemon reads validatedimagePathsat session attach time and includes them in the prompt command. Unreadable images are skipped rather than failing the run. - Skills: prompt injection in v1.
extraAllowedDirs(skill seed and design-system directories) are forwarded as--append-system-promptrepeatable flags so the agent knows these directories exist and can Read files inside them. pi doesn't have an--add-dirsandbox flag β it uses OS cwd β so system-prompt hints are the only available mechanism. Important:--append-system-promptonly hints paths in the system prompt; it does not grant sandbox or filesystem access. pi's Read tool can normally open absolute paths outside cwd, but when absolute reads fail (sandboxed environments, restricted permissions), the reliable fallback is to stage copies of the needed files into the project cwd before the run. No stronger pi flag exists for this purpose today. - Thinking: the daemon exposes pi's
--thinkinglevels (off,minimal,low,medium,high,xhigh) in the Settings model picker. - Extension UI: auto-resolved. pi's RPC protocol can request user dialogs (
select,confirm,input,editor) and fire-and-forget notifications (setStatus,setWidget,notify,setTitle,set_editor_text). Dialog methods are auto-approved (confirm β true, select β first option) and fire-and-forget methods are silently consumed because the web UI has no surface for them. - Gotcha: pi's RPC
promptresponse is asynchronous βsuccess: trueonly means the prompt was accepted, not that the agent finished. Agent failures after acceptance surface through the normal event stream (extension_error,auto_retry_endwithsuccess: false) and the empty-output guard.
- Invocation:
deepseek exec --auto [--model <id>] "<prompt>". Thedeepseekdispatcher owns theexec/--autosubcommands and delegates to a siblingdeepseek-tuiruntime binary at exec time; upstream documents both binaries as required (the npm and cargo paths install them together). We only probe the dispatcher βdeepseek-tuion its own doesn't accept this argv shape, so advertising it as a fallback would surface the agent as available but fail on the first chat run. A future revision could teach resolution + buildArgs which binary was selected and emit a verifieddeepseek-tuiinvocation, with a regression test exercising that path. - Streaming: plain text deltas to stdout in non-
--jsonmode (tool-call notifications go to stderr). Skipping--jsonis intentional βdeepseek exec --jsonbatches the entire run into one trailing summary object instead of streaming, which would freeze the chat UI until end-of-turn. - Auto-approval:
--autoenables agentic mode with the YOLO permission posture. The daemon runs every CLI without a TTY, so the interactive approval prompt would otherwise hang the run. - Skills: prompt injection only in v1. DeepSeek TUI does walk
.agents/skills,skills,.opencode/skills,.claude/skills, and~/.deepseek/skillsfirst-wins, so a future revision can switch to file-placed skill loading the same way Claude Code does. - Prompt delivery: positional argv (no stdin sentinel; clap declares
prompt: Stringas a required field). This means very large composed prompts can hit Windows' ~32 KBCreateProcesslimit; for typical chat prompts this is non-issue. Upstream support for a-stdin sentinel would let us flip this topromptViaStdin: truelike the other adapters. To avoid surfacing oversized prompts as a genericspawn ENAMETOOLONG/E2BIG, the adapter declaresmaxPromptArgBytes(currently 30,000) and/api/chatenforces it through three complementary guards: a fast pre-bin-resolutioncheckPromptArgvBudgetagainst the raw composed prompt bytes, a post-buildArgscheckWindowsCmdShimCommandLineBudgetthat β when the resolved binary is a Windows.cmd/.batshim β recomputes the would-becmd.exe /d /s /c "<inner>"command line using the same per-arg quote-doubling the platform layer applies on Windows, and a siblingcheckWindowsDirectExeCommandLineBudgetthat β when the resolved binary is a non-shim Windows install (e.g. a cargo-builtdeepseek.exe) β recomputes the same command line using libuv'squote_cmd_argrules (every"becomes\", backslashes adjacent to a quote are doubled). The two Windows guards are mutually exclusive on a given resolution: the cmd-shim guard owns.cmd/.bat, the direct-exe guard owns everything else. Together they catch quote-heavy prompts (code blocks, JSON-shaped skill seeds) that fit under the raw byte budget but expand past CreateProcess's 32_767-charlpCommandLinecap on either install path. All three guards emit the same actionableAGENT_PROMPT_TOO_LARGESSE error telling the user to reduce skills/design-system context, shorten the conversation, or pick an adapter with stdin support, and all three are unit-tested (oversized + short-prompt branches, quote-heavy regressions for both Windows paths, and a mutual-exclusivity check) so the guards can't silently regress. - Models: ships
deepseek-v4-proanddeepseek-v4-flashas fallback hints (1M-token context windows, native thinking-mode streaming). Users can paste any other id (e.g.nvidia-nim/deepseek-v4-pro,fireworks/deepseek-v4-flash) via the Settings dialog's custom-model input. - Gotcha β auth state is not auto-detected. DeepSeek TUI reads its API key from
~/.deepseek/config.tomlorDEEPSEEK_API_KEY. If the user hasn't rundeepseek auth set --provider deepseek(or set the env var), the first run errors out with a non-actionable message. Detection currently only reportsavailable: truebased on the binary being on PATH; surface auth state viadeepseek doctor --jsonin a follow-up.
The web UI reads agents.capabilities() and disables features that the active adapter can't support:
| UI feature | Requires | If missing |
|---|---|---|
| Comment mode (click to refine) | surgicalEdit: true |
Hidden; show tooltip explaining why |
| Streaming tool-call feed | streaming: true |
Show a spinner only |
| Resume interrupted run | resume: true |
"Cancel + restart" only |
Skill picker shows skill with od.capabilities_required |
all listed caps | Skill greyed out with reason |
This is how we avoid "works on my Claude Code, breaks on your Gemini" β we detect, degrade, and document.
The user can switch active agent per session:
POST agents.setActive { agentId: "codex" }
β capabilities() reported
β web UI refreshes feature gates
β next generation runs on Codex
Switching mid-run is not allowed (cancel first). The artifact is agent-agnostic; only the generation process differs.
If the user's preferred agent fails (crash, auth, timeout), OD offers a one-click fallback in this order:
- User's preferred agent (e.g. Cursor Agent)
- Any other detected agent (Claude Code, if installed)
- API fallback (direct Anthropic, requires key)
The user explicitly opts in to fallback β we don't silently switch, because a skill may have been authored for a specific agent's capabilities.
First run:
$ pnpm tools-dev run web
[od] daemon starting on :7456
[od] detecting agentsβ¦
[od] β claude-code v0.6.3 (auth: ok, skills dir linked)
[od] β codex v0.8.1 (auth: ok)
[od] β cursor-agent (not installed)
[od] β gemini-cli (installed but not authenticated; run `gemini auth login`)
[od] β api-fallback (ANTHROPIC_API_KEY found)
[od] daemon ready; 3 agents available
Web UI mirrors this in an agent-selector dropdown, with unauthenticated agents shown greyed out with a fix-it tooltip.
We inherit the underlying agent's permission model rather than building our own. This means:
- Claude Code respects its own
--allowed-toolsand--permission-modeflags. OD passes through user preferences. - Codex / Cursor sandbox by workspace; OD always sets cwd to the artifact dir so nothing outside is visible by default.
- Qoder CLI runs with
--permission-mode bypass_permissionsfor non-interactive web execution and is scoped by the daemon's cwd plus explicit absolute--add-direntries. - API fallback is the one case we own. We implement a whitelist: only
Read,Write,Edittools, all rooted at the artifact cwd. Network access is off.
The daemon never grants more authority to an agent than it had on its own. We don't run the agent in a privileged mode "for convenience."
apps/daemon/
βββ base.ts # shared interface + utility helpers
βββ claude-code/
β βββ adapter.ts
β βββ stream-parser.ts # JSON-lines β AgentEvent
β βββ detect.ts
βββ api-fallback/
β βββ adapter.ts
β βββ tool-loop.ts # the minimal tool-use loop
β βββ tools.ts # Read/Write/Edit implementations
βββ codex/ # Phase 1
βββ cursor-agent/ # Phase 1
βββ gemini-cli/ # Phase 2
βββ opencode/ # Phase 2
βββ openclaw/ # Phase 2
Each adapter is a separate module so community contributions can add new ones without touching core daemon code.
- Nested agents. What if Claude Code's agent itself spawns a subagent? We receive events from the outer process only. v1 policy: surface only top-level events; summarize subagent activity as "sub-task" placeholder.
- Billing awareness. Some agents bill per message, some per token. OD doesn't track cost in MVP; v1 adds an optional "usage" event from adapters that expose it.
- Windows support. PATH scanning and
spawnsemantics differ on Windows. v1 targets macOS and Linux; Windows is best-effort. Known issue fixed:spawn ENAMETOOLONGwhen running Gemini CLI (and other plain-text agents) on Windows β resolved by routing the composed prompt through stdin instead of as a CLI argument (see Β§5.5). - Docker-contained agents. Some users run Claude Code in a container. Adapter needs a "remote" mode β probably same interface but talks over SSH. Phase 2+.