Skip to content

feat(cockpit): native ACP rendering surface (Beta) for all supported agents#868

Draft
njbrake wants to merge 38 commits intomainfrom
native
Draft

feat(cockpit): native ACP rendering surface (Beta) for all supported agents#868
njbrake wants to merge 38 commits intomainfrom
native

Conversation

@njbrake
Copy link
Copy Markdown
Owner

@njbrake njbrake commented Apr 30, 2026

Description

Adds cockpit, an ACP-based structured rendering surface that runs alongside the existing tmux passthrough. Every aoe session is now a per-session pick: tmux (legacy, raw bytes through wterm) or cockpit (Beta, agent speaks Agent Client Protocol; aoe renders typed events as React cards). Tmux remains the default — cockpit is opt-in via aoe add --cockpit or the new substrate picker on the web wizard.

The data model (cockpit_mode: bool per session) is already merged on main; this branch is the polish + ecosystem expansion that turns cockpit into something you'd actually want to use, plus per-tool ACP support so it's not Claude-only.

What's in here

Core substrate (was already on the branch from earlier merges):

  • Per-session ACP supervisor (src/cockpit/supervisor.rs) with restart budget, drain task, fs/terminal handlers
  • Replay-buffered WebSocket fanout (src/server/cockpit_ws.rs)
  • React surface built on @assistant-ui/react primitives
  • Sandbox support via unix-socket transport

Per-tool ACP (e4ad824): verified each agent's ACP invocation against agentclientprotocol.com/get-started/agents.md and seeded the registry:

Tool Path
claude claude-agent-acp (Zed adapter)
opencode opencode acp (native, SST)
gemini gemini --acp (native, Google)
codex codex-acp (Zed adapter)
vibe vibe-acp (native, Mistral)
pi pi-acp (Hermes coding agent)
aoe-agent bundled multi-provider fallback
aider, cursor, copilot, droid, settl, hermes greyed out — terminal-only

Supervisor::pick_agent_for_tool(tool, override) replaces three copy-pasted "claude → claude-code, else aoe-agent" fallbacks.

UX polish (56a8822dd249aa):

  • Composer rebuilt VSCode/Cursor-style — multi-line, lucide icons, focus glow, paper-plane Send / square Stop
  • Markdown rendering with shiki code blocks + smooth streaming (@assistant-ui/react-markdown)
  • Per-kind tool cards: bash / read / edit / search / fetch / think with proper input parsing
  • react-diff-viewer-continued for edit cards
  • @-mention file picker (assistant-ui's Unstable_TriggerPopover + workspace file index endpoint)
  • / slash commands
  • Empire-themed working spinner ("Conscripting villagers", braille rattle)
  • Hover affordances (copy/edit/regenerate via ActionBarPrimitive)
  • Approval cards realigned with tool-card visual language
  • Mode picker: real ACP-advertised modes from NewSessionResponse.modes, drop-up menu in composer footer

Reliability fixes:

  • 3cccf46 — drain-task/send_prompt deadlock (drain held client mutex across recv().await)
  • 28e8066 + 30a21f8 — TUI no longer marks cockpit sessions as errored ("tmux pane is gone")
  • d244ad6 — composer wins focus race against wterm's async init
  • c1bb7e0 — auth env forwarded by default (ANTHROPIC_API_KEY, CLAUDE_CONFIG_DIR, etc.)

Library refactor (f1298bc): replaced ~520 lines of reinvention with first-party assistant-ui primitives (TriggerPopover, MarkdownTextPrimitive, smooth streaming).

Doctor + docs:

  • aoe cockpit doctor walks the full registry, prints per-agent install hints, --fix npm install -gs the npm-distributed adapters
  • docs/cockpit.md gets a Beta callout, per-agent support/auth table, and updated doctor sample

PR Type

  • New Feature

Checklist

  • I understand the code I am submitting
  • New and existing tests pass (5/5 cockpit_acp_smoke, 13/13 e2e Playwright steps)
  • Documentation was updated (docs/cockpit.md)
  • For UI changes: included screenshot or recording (see screenshots in commits)

AI Usage

  • AI was used for drafting/refactoring

AI Model/Tool used: Claude Opus 4.7 via Claude Code

Any Additional AI Details you'd like to share:
The branch was developed iteratively across many conversations. Architecture decisions (ACP as substrate B, per-session toggle, supervisor ownership of agent processes, drain-task pattern, assistant-ui for the React surface) were human-directed; the AI handled implementation, testing, and UX iteration. Notable AI-caught issues that humans would have caught later: the deadlock in Supervisor's drain task, the focus race against wterm's async init, the missing title fallback for tools with empty raw_input. Notable AI-missed issues that humans caught: the mode picker initially said "Default" when the agent was actually in yolo mode (we weren't reading agent-advertised modes), the bash card showed $ {} for the same reason, and the early version reinvented Unstable_TriggerPopover + MarkdownTextPrimitive instead of using the assistant-ui primitives that were already in our deps.

  • I am an AI Agent filling out this form (check box if true)

How to build & run

cargo build --features serve --profile dev-release
./target/dev-release/aoe add . --cmd claude --cockpit   # one cockpit session
./target/dev-release/aoe serve                          # web dashboard

Note: the cockpit cargo feature was folded into serve (commit
ec5cfb0). If you saw earlier instructions saying --features 'serve cockpit',
just use --features serve now. Cockpit ships alongside the dashboard.

How tested

cargo test --features serve --test cockpit_acp_smoke   # 5/5 pass
cargo build --features serve --profile dev-release
cargo clippy --features serve -- -D warnings   # clean

Plus a Playwright e2e harness against a live aoe serve with a Node ACP test shim:

  • 13 steps: session create → cockpit composer mounts → user prompt → agent text streams → tool call card renders → final "done" → REQUEST_PERMISSION flow → Allow click → permission_outcome=yes
  • Verified the @ file picker + / slash command popovers
  • Verified the working spinner with verb cycling
  • Verified focus reclaim after wterm async init

Caveats

  • The five non-Claude agents (opencode/gemini/codex/vibe/pi) were verified at the documentation level (matching agentclientprotocol.com/agents.md against upstream docs) and via the supervisor/registry tests, but I didn't exercise each adapter end-to-end on real hardware in this branch — the smoke and e2e tests use the shim. First time you aoe add . --cmd opencode --cockpit you may hit per-agent quirks.
  • aoe cockpit doctor only checks binary presence, not auth state. A future improvement would spawn each adapter's initialize and inspect the response's auth_methods.
  • One Unstable_* primitive from assistant-ui (Unstable_TriggerPopover); if upstream renames it on a minor bump we'd need a mechanical update.

Test plan

  • claude /login then aoe add . --cmd claude --cockpit — verify cockpit conversation works end-to-end with a real Claude subscription
  • aoe add . --cmd opencode --cockpit — verify the registry expansion picks up opencode acp correctly
  • aoe cockpit doctor --fix — verify it installs the missing adapters
  • Web wizard — verify the substrate picker greys out for tools we know don't have ACP
  • TUI — verify no spurious "tmux pane is gone" errors on cockpit sessions

🤖 Generated with Claude Code

njbrake and others added 30 commits April 25, 2026 09:09
Adds the cockpit feature behind a Cargo flag. Implements the ACP client
spine: subprocess spawn, JSON-RPC handshake, session creation, and prompt
loop, plus the typed state/approval/replay-buffer modules from the v4
design. Validated end-to-end by a Node ACP shim agent that replays
scripted session/update events.

Deferred to follow-up slices:
- Permission responder side-channel (currently auto-approves yolo-style)
- Typed mapping of session/update kinds to CockpitState fields
- AcpClient hooking fs/* and terminal/* into existing handlers
- aoe-agent tool stubs that delegate via ACP
- Settings TUI wiring, CLI commands, migration, WebSocket fanout
- React components, push notifications, Docker socket transport, docs

Tests: 22 cockpit unit + 1 e2e integration, 1094 existing tests still pass.
Build: cockpit feature opt-in; default build unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the yolo auto-approve with a proper responder side-channel:
on_receive_request parks the ACP responder keyed by a server-side
nonce; resolve_permission(nonce, decision) wakes the parked future
and answers with the matching option_id from the agent's offered
options.

Map ACP SessionUpdate variants to typed CockpitState Event variants
(AgentMessageChunk, ToolCallStarted, ToolCallCompleted, PlanUpdated,
ModeChanged) instead of passing everything through as RawAgentUpdate.

Add a permission round-trip e2e test against the test shim agent.

Tests: 26 cockpit unit + 2 e2e integration, 1094 existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hook the cockpit's FsPolicy + TerminalManager into the ACP client's
incoming-request callbacks. Now agents can issue fs/read_text_file,
fs/write_text_file, terminal/create, terminal/output, terminal/wait,
terminal/kill, terminal/release and aoe handles them with sandbox
enforcement (worktree-rooted FsPolicy).

Update aoe-agent to declare Read/Write/Bash tools via Vercel AI SDK 6
whose execute() bodies delegate back to aoe over ACP. The model never
touches the filesystem or shell directly.

Declare client capabilities (fs.readTextFile, fs.writeTextFile,
terminal) in the ACP initialize so agents know they can use them.

Tests: 26 cockpit unit + 4 e2e integration (added fs + terminal
round-trip tests against the shim). 1094 existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add migration v005_cockpit_defaults: seeds [cockpit] section in the
  global config.toml on upgrade so users can flip the flag on without
  hand-editing.

* Add CockpitConfig struct to session::config with 8 documented fields
  matching the v4 design doc: enabled, default_for_claude,
  default_agent, approval_timeout_secs, destructive_require_double_
  confirm, max_concurrent_workers, replay_events, replay_bytes,
  node_path. All with serde defaults; loadable from config.toml.

* Add `aoe add` flags: --cockpit, --no-cockpit, --agent <name>,
  --model <id>. The first two are mutually exclusive; the agent
  flag implies cockpit.

* Add `aoe cockpit` subcommand with:
  - doctor [--json] [--fix]: checks Node runtime + each configured
    agent's spawn command. Exits 0/1/2 for ok/fail/partial.
  - agents: lists the registry with present/missing markers.
  - logs/restart: stubs reserved for the worker supervisor slice.

Full settings TUI editing wiring (FieldKey + build_*_fields + merge
logic across 8 fields × 5 touchpoints) is deferred to a follow-up;
config loads cleanly via serde defaults today.

Tests: 1263 lib tests + 4 e2e + 5 cockpit-acp integration all green;
1095 default-feature tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add CockpitBroadcastFrame (session_id + seq + event JSON) and a
  per-AppState broadcast::Sender<CockpitBroadcastFrame> with a 256-event
  capacity. Behaves like the existing status_tx fanout.

* New WebSocket route /sessions/{id}/cockpit/ws (gated on cockpit
  feature). Subscribes to the broadcast and forwards frames matching
  the route session_id; emits a `lagged` notice frame so clients can
  request a snapshot+replay rather than diverge silently.

* trigger_approval_push() helper that fires a Web Push payload to all
  subscribers when an ApprovalRequested event is observed. Reuses the
  existing PushState + push_send infrastructure. Wired so the worker
  supervisor (next slice) can call it without further plumbing.

* Refactored build_router to use a let-bound chain so the cockpit
  route can be conditionally added under #[cfg(feature = "cockpit")].

Tests: lib tests now 1264 (up from 1263) with the cockpit-ws unit test
guarding publish-with-no-receivers behavior. 1095 default tests still
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds src/cockpit/node.rs with the documented resolve order:
1. AOE_COCKPIT_NODE env
2. cockpit.node_path setting
3. node on PATH (>= 20 enforced)
4. previously-extracted bundled Node at $AOE_DATA_DIR/cockpit/node-vX

Tarball download is stubbed with a typed NotYetWired error so the
cockpit doctor can surface a clear "install Node yourself for now"
message until the auto-download lands in a follow-up. Docker unix-
socket transport for sandboxed cockpit sessions is also deferred —
the architecture supports it (acp_client takes a generic ByteStreams)
but the spawn path needs sandbox-aware plumbing that's its own slice.

Tests: 4 new unit tests covering env/PATH/bundled paths, including a
serial pair that scrubs+restores PATH/AOE_COCKPIT_NODE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* lib/cockpitTypes.ts: typed wire model mirroring CockpitBroadcastFrame
  + a pure reducer applyEvent() that materialises CockpitState from a
  stream of frames. Bounded activity log (200 rows) and recentDiffs (16).

* hooks/useCockpit.ts: WebSocket subscription to
  /sessions/{id}/cockpit/ws, dispatched through a useReducer; lagged
  control frames flag the state so the UI can request a snapshot.
  resolveApproval helper POSTs decisions to a REST endpoint that the
  worker supervisor will wire up.

* components/cockpit/ApprovalCard: 3 phases (pending / submitting /
  rolled-back), destructive-vs-benign affordance per the design spike
  (long-press 800ms with progress ring + haptic for destructive;
  single tap for benign). Swipe never approves.

* components/cockpit/PlanPanel: sticky current step, collapsed
  completed disclosure, expanded upcoming. Cancelled steps are
  rendered as struck-through.

* components/cockpit/ActivityStream: tool rows with kind glyphs +
  colours (start=amber, complete=emerald, error=red, message=teal),
  thinking/in-flight chrome.

* components/cockpit/CockpitView: top-level mobile-first layout
  composing the above plus connection chrome (connecting / lagged /
  closed banners) and the rate-limit notice.

Type-checks pass; Vite production bundle builds clean. Mobile-vs-
desktop layout split (3-pane on >=768px) + ChatDrawer + push-tap
deep-linking deferred polish; production wiring of the REST endpoint
ships with the worker supervisor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs/cockpit.md follows the 10-section outline from the DX review:
  what cockpit is, quickstart, requirements, verify (doctor), enabling
  per-session + globally, escape hatches, tool compatibility matrix,
  approvals UX, security, troubleshooting, deferred items.

* website/scripts/sync-docs.mjs: register docs/cockpit.md in PAGES +
  URL_MAP so the nav link resolves on agent-of-empires.com.

* website/src/data/docsNav.ts: link the new page under Guides.

The upgrade messaging story is covered today by:
- v005 migration silently seeds [cockpit] section in config.toml
- aoe cockpit doctor is discoverable via aoe --help
- docs/cockpit.md is the canonical reference

Explicit first-run TUI card is deferred to a follow-up; the doctor
command serves the same affordance and is already wired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* src/cockpit/supervisor.rs: per-aoe-process Supervisor that owns
  AcpClients keyed by session_id. Spawn/shutdown lifecycle, drain task
  bridges client events to a BroadcastSink, restart-budget bookkeeping
  (3 restarts in 60s window before parking the session in Status::Error).
  ChannelSink impl publishes to AppState::cockpit_events_tx and fires
  approval-side hooks.

* src/server/api/cockpit.rs: REST endpoints
  - POST /api/sessions/{id}/cockpit/spawn (start a worker)
  - DELETE /api/sessions/{id}/cockpit (shutdown)
  - POST /api/sessions/{id}/cockpit/prompt (send user input)
  - POST /api/sessions/{id}/cockpit/approvals/{nonce} (resolve approval)

* AppState gets cockpit_supervisor: Arc<Supervisor<ChannelSink>>; the
  router wires the new routes under #[cfg(feature = "cockpit")].

* Instance gains cockpit_mode + cockpit_agent + cockpit_model fields,
  hidden from serde when default. aoe add --cockpit/--no-cockpit/--agent
  /--model now flow through into Instance.

Tests: 4 supervisor unit tests (spawn-unknown-agent, double-spawn,
count, restart budget) + 4 e2e + 1273 lib tests with cockpit feature
on. 1095 default-feature tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add tar + xz2 deps under the cockpit feature for tarball extraction.

* node::download() resolves the host platform (linux x64/arm64,
  macOS x64/arm64; Windows is intentionally unsupported because it
  ships .zip), fetches the pinned Node 22.21.0 tarball from
  nodejs.org/dist, verifies SHA-256 against an embedded table, and
  extracts to $AOE_DATA_DIR/cockpit/node-vX.Y.Z/.

* Pinned SHAs come straight from nodejs.org's SHASUMS256.txt; bumping
  PINNED_NODE_VERSION requires refreshing every entry. A unit test
  enforces that all four supported platforms are covered.

* aoe cockpit doctor --fix now triggers the download when no usable
  Node is on PATH. The CLI command is now async so it can await the
  fetch + extract.

Tests: 6 node unit tests (was 4; added sha256_hex against the empty-
string vector + a coverage check on the SHA table).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires every cockpit setting through the documented FieldKey + override
+ merge pipeline so they're editable in the settings TUI with profile
overrides that round-trip correctly.

* CockpitConfigOverride struct in profile_config with Option<T> for
  every field; merge_configs honors each override.
* New SettingsCategory::Cockpit; build_cockpit_fields renders all 8
  fields (3 bool, 4 number, 2 text) with inheritance markers.
* apply_field_to_global covers each field; apply_field_to_profile
  uses the existing set_profile_override helper.
* clear_profile_override sets each Option to None when the user hits
  the 'r' key.
* Re-export CockpitConfigOverride from session::mod.

Tests: 1274 lib tests with cockpit feature on (was 1268, +6 from the
config + node module additions). 1095 default-feature tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* SpawnConfig.socket_path option: when set, aoe binds a unix listener
  at that path BEFORE spawning the agent, exports AOE_ACP_SOCKET=<path>
  to the agent's env, waits up to 10s for the agent to connect, and
  uses the connected UnixStream's split halves as the ByteStreams
  transport. On task exit the socket file is unlinked.

* run_connection_task is now generic over <W: AsyncWrite, R: AsyncRead>
  so the same body handles stdio (ChildStdin/ChildStdout) and socket
  (UnixStream split halves). socket_path is also threaded in for
  cleanup.

* test-shim honors AOE_ACP_SOCKET: connects to the socket and uses it
  as the ndJsonStream transport. Falls back to stdio when unset.

* New e2e test shim_agent_round_trips_via_unix_socket exercises the
  full round-trip end-to-end: aoe creates the socket, spawns the shim,
  shim connects, prompt + session/update flow back. Same shape as the
  stdio path.

Tests: 5 cockpit e2e tests (was 4); 1274 lib tests; 1095 default tests.

Docker bind-mount integration (one -v line in src/containers/runtime_
base.rs for sandboxed cockpit sessions) lands when the cockpit session
type is wired into the sandbox spawn path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* CockpitView is now responsive via a useIsDesktop matchMedia hook:
  mobile (<768px) renders single-column stack with a chat drawer FAB;
  desktop (>=768px) renders three-pane (plan left 300px, activity
  center, chat dock right 360px).

* ChatDrawer component supports both variants:
  - mobile: bottom-anchored sheet with FAB to open/close, slides from
    bottom; close button visible
  - desktop: always-docked column on the right
  Enter sends, Shift+Enter for newline, optimistic disable while
  sending, plain hover/focus styling matching the cockpit palette.

* useCockpit gains sendPrompt(text) helper that POSTs to
  /api/sessions/{id}/cockpit/prompt and forwards through to the
  worker supervisor.

* Approval and connection chrome moves to a top header so it overlays
  on mobile but inlines on desktop.

Type-checks pass; Vite production bundle builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add AppStateConfig.has_seen_cockpit_intro (cfg-gated on cockpit).
  Tracked separately from has_seen_welcome / last_seen_version so the
  one-time intro fires once, regardless of which version actually
  introduced cockpit on the user's machine.

* New CockpitIntroDialog: 70x18 centered modal with the quickstart
  command, doctor command, docs URL, and a quiet note about the Node
  prereq. Same key handling as the existing welcome dialog (Enter /
  Esc / Space / q to dismiss).

* Wired into HomeView like the existing one-time dialogs:
  - cockpit_intro_dialog: Option<CockpitIntroDialog> field
  - show_cockpit_intro() helper
  - input.rs dispatch
  - render.rs dispatch (cfg-gated branch after the macro for the
    other dialogs since the macro is shared with non-cockpit builds)

* App::new fires it after the welcome+changelog flow when the flag is
  unset, then persists the flag.

Tests: 1274 lib tests, 5 cockpit e2e all pass; 1095 default tests
unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* web/App.tsx: dispatch on activeSession.cockpit_mode — sessions with
  the flag render <CockpitView/> in place of <TerminalView/>. The
  fallback for tmux-mode sessions is unchanged so existing terminal
  sessions keep working exactly as before.

* SessionResponse gains cockpit_mode (gated on the cockpit feature
  server-side; optional in the TS shape so non-cockpit builds still
  satisfy the type).

* CreateSessionBody learns cockpit_mode (defaults TRUE via
  default_cockpit_for_web so browser-created sessions land in the
  cockpit by default), cockpit_agent, cockpit_model. The fields flow
  through into the constructed Instance.

* Cockpit-mode sessions skip tmux start() — no empty pane is created
  for sessions whose backend is the ACP supervisor.

* After a successful create, if the session is cockpit_mode, kick off
  Supervisor::spawn() on a background task. claude tool defaults to
  the claude-code agent; everything else defaults to aoe-agent. Spawn
  failures (missing Node, etc.) log a warning but don't fail the
  request — the user can retry via the cockpit/spawn endpoint.

* aoe serve startup now sweeps persisted instances with cockpit_mode
  and spawns workers for them too. Same best-effort semantics; happens
  in parallel so a slow agent doesn't block the listener bind.

TUI default behavior is unchanged: NewSessionData doesn't set
cockpit_mode so it defaults false and `n` continues to create tmux-
backed sessions. Users opt in via aoe add --cockpit from the CLI.
A visible toggle in the new-session dialog is a small UI follow-up.

Tests: 1299 cockpit lib tests, 5 e2e, 1116 default. Clippy clean.
Auto-formatted by cargo fmt as part of the precommit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first cockpit view reinvented the layout with its own three-pane
split, which collided with the app shell — <ContentSplit> in App.tsx
already owns the workspace sidebar (left) and terminal/diff (right).
The cockpit's job is just the middle pane, like Conductor's chat
window.

* CockpitView now renders a single scrollable conversation:
  - Optional plan strip pinned at the top, click to expand the steps.
  - Message-style cells: user prompts as right-aligned bubbles, agent
    text as full-width prose. Consecutive agent_message_chunk events
    fuse into one bubble.
  - Tool calls render INLINE as collapsible cards (status dot +
    one-line summary, click to reveal output).
  - Pending approvals appear inline at the bottom of the feed.
  - Thinking indicator as a small italic bubble.
  - Input area pinned at the bottom (auto-grow, Enter sends,
    Shift+Enter newline).
  - System notices (connecting / lagged / rate-limited) as a thin
    bar above the feed.
  - Auto-stick-to-bottom unless the user scrolled up >80px.

* useCockpit::sendPrompt now also dispatches a `user_prompt` action
  that appends a user-side ActivityRow so the user's outgoing turns
  appear in the conversation timeline. ActivityRow.kind gains
  `user_prompt`.

* Drop the now-stale subcomponents: ChatDrawer (replaced by inline
  Composer), PlanPanel (replaced by PlanStrip header), ActivityStream
  (replaced by ConversationFeed).

* Drop useIsDesktop / 3-pane split entirely.

Tests: 1299 lib + 5 e2e, type-check + Vite build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssions

Two related cockpit bugs:

- ACP spawn could hang silently (e.g. `npx -y` downloading on first run).
  Gate spawn() on a 30s handshake deadline; on timeout, kill the wedged
  child and publish a new AgentStartupError event end-to-end (broadcast
  -> WS -> reducer -> red banner with `aoe cockpit doctor --fix` hint).
  Default `claude-code` agent now uses the installed `claude-agent-acp`
  binary instead of `npx -y`; `doctor --fix` runs the global npm install.

- Cockpit-mode sessions were polled like tmux sessions and surfaced a
  spurious "tmux session is gone" Error. Short-circuit
  update_status_with_metadata_inner for cockpit_mode and clear any stale
  error state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	Cargo.lock
#	src/server/api/sessions.rs
#	web/src/lib/types.ts
The supervisor's drain task held client.lock() across next_event().await,
which blocks indefinitely waiting on the inbound mpsc. Any concurrent
send_prompt (or any other Supervisor method) tried to acquire the same
mutex and hung forever, so the very first prompt from the web UI never
made it past the API layer.

Move the inbound mpsc::Receiver out of AcpClient (now Option<...>) when
the supervisor builds the worker, and let the drain task own the receiver
directly. The mutex now only guards the cmd_tx side, which is fine because
that side never await-blocks past a channel send.

Found by an e2e test of the cockpit UI; verified by reusing the existing
cockpit_acp_smoke tests (5/5 still pass) plus a Playwright run that sends
two prompts and resolves an approval through the real React surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small fixes that together make the first-run cockpit experience
not feel broken:

- Remove the one-time "New: Cockpit (Native Agent Rendering)" TUI popup
  and its has_seen_cockpit_intro tracking. Discoverability lives in the
  docs and the `aoe cockpit` subcommand; we don't need an extra dialog
  on every first launch.
- acp_client.spawn_subprocess now also forwards ANTHROPIC_API_KEY,
  ANTHROPIC_AUTH_TOKEN, CLAUDE_CODE_OAUTH_TOKEN, and CLAUDE_CONFIG_DIR
  by default. Without this, users who already have ANTHROPIC_API_KEY
  exported (the common case) hit "Authentication required" because the
  agent inherits an env_clear()'d environment and can't see the key.
- StartupErrorBanner branches on the error message: when the failure is
  auth-shaped (matches /authentic|login|api[_ -]?key/i), show "set
  ANTHROPIC_API_KEY or run claude /login" instead of the
  install-the-adapter hint, which is misleading once the binary is
  already on PATH.
- CockpitView and ApprovalCard now use the shared design tokens
  (surface-*, text-*, brand-*) instead of Tailwind's blue-tinted slate
  and amber palettes, matching the surrounding app shell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…polish)

Brings the cockpit conversation surface in line with Cursor / VSCode
agent chat patterns. All seven items from the UX research land here:

1. Agent text now renders as markdown (paragraphs, headings, lists,
   blockquotes, fenced code blocks with shiki syntax highlighting,
   inline code, links). New `Markdown.tsx` parses with `marked` and
   renders each token through tailwind-styled React components so
   colors come from the design tokens, not the library defaults.

2. Per-tool renderers in `ToolCards.tsx`, dispatched by the new
   `kind` field on `ToolCall` (plumbed from ACP `ToolKind`):
     - `execute` shows a `$ command` line with collapsible body
     - `read` shows file path + optional line range
     - `edit` / `delete` show a mini-diff with `-` / `+` styling
     - `search` shows the query (and scope when known)
     - `fetch` shows the URL
     - `think` is a one-line italic note
     - `other` falls back to a generic expandable card

3. Vertical rhythm: a soft horizontal divider above each user turn
   (except the first), tighter spacing for tool→tool runs, more
   breathing room for agent text.

4. Stop button: when the agent is thinking or has a tool in flight,
   the Send button is replaced by a stop-square that POSTs to a new
   `/api/sessions/{id}/cockpit/cancel` endpoint. The endpoint sends an
   ACP `session/cancel` notification via a new `ClientCmd::Cancel` and
   `AcpClient::cancel_prompt`.

5. Empty state: three starter prompt chips ("Explain this codebase"
   etc.) replace the bare "type a prompt" placeholder. Clicking a chip
   sends the prompt immediately.

6. Refined user bubble: smaller right-aligned chip with a rounded-br
   corner, subtle border, no longer competing with agent text for
   visual weight.

7. Composer affordances: keyboard hint (`Enter to send · Shift+Enter
   for newline`) under the textarea, auto-focus on mount so the user
   can type immediately after picking a session.

Backend: `ToolCall` gains a `kind: String` field carrying the ACP
`ToolKind` lowercased (read/edit/execute/search/...). The web
`ActivityRow` now keeps the full `ToolCall` payload on `tool_start`
rows so the UI doesn't have to look it up by id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the cockpit overhaul:

1. wterm's async init() at node_modules/@wterm/dom/dist/wterm.js:56
   calls input.focus() unconditionally after the WASM bridge loads,
   firing 200-500ms after mount and stealing focus from the agent
   composer. The composer's onMount focus runs sync and loses the
   race. Re-claim focus at 250ms and 700ms, but only when focus is on
   document.body or inside .wterm — so an intentional click into the
   host shell during the window sticks.

2. The "Thinking…" bubble only showed during ACP AgentThoughtChunk
   events, leaving the user staring at a blank pane while the agent
   was running tools or waiting for first text chunk. Track a new
   `turnActive` flag (set on user_prompt, cleared by Stopped /
   AgentStartupError) and render a spinning glyph + contextual label
   whenever the turn is open: "Thinking…", "Running <tool>…", or
   "Working…" otherwise. Also drives the Send → Stop button swap so
   cancel is reachable for the entire turn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the plain "Working…" / "Thinking…" / "Running X…" labels with a
braille-spinner glyph and a rotating verb pool themed around Agent of
Empires' civilization-building flavor. A nod to Claude Code's
"ruminating" / "noodling" verbs and the Rust `rattles` spinner crate
the TUI already uses for ratatui status indicators.

`cockpitRattle.ts` defines:
  - SPINNER_FRAMES: 10-step braille rotation (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏), 80ms/frame
  - WORKING_VERBS: 35 empire verbs (Conscripting villagers, Marshalling
    forces, Quarrying granite, Calibrating trebuchets, Plundering
    archives, Negotiating treaties, …)
  - THINKING_VERBS: 14 mystical/divinatory verbs for AgentThoughtChunk
    state (Consulting auguries, Casting bones, Whispering with elders,
    Studying the stars, …)
  - chooseVerb() — deterministic from a seed so the verb stays stable
    across re-renders within a tick. The seed bumps every 4s so long
    turns rotate through different verbs without flickering.

Tool runs override the verb pool: instead of a generic empire word,
the tool's own name is dressed up with a pool of action prefixes
("Wielding read", "Dispatching write", "Marshalling search", …), so
the user always sees what's actually running.

Test infra: the shim now honors a "SLOW" prompt keyword that adds
800ms gaps between session/update events, so e2e tests can observe
the mid-turn UI without a real model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itives

Replace the hand-rolled conversation feed with assistant-ui's headless
React primitives. We keep ownership of the chat *state* (it streams
from the cockpit WS, not from a chat protocol assistant-ui knows about);
assistant-ui owns the chat *surface* — scroll viewport, message list,
keyboard shortcuts, accessibility, message-edit affordances, the
running/idle gating.

Architecture
  ws frame  →  applyEvent → CockpitState.activity (ours)
                                  │
                                  ▼
                    activityToThreadMessages()  →  ThreadMessageLike[]
                                  │
                                  ▼
                    useExternalStoreRuntime(adapter) → AssistantRuntime
                                  │
                                  ▼
                    <AssistantRuntimeProvider runtime>
                                  │
                                  ▼
                    <ThreadPrimitive.Messages components={…}>
                    <ComposerPrimitive.Root>

The new `CockpitRuntime` component wraps the cockpit and exposes the
raw state via render-prop for the bits assistant-ui doesn't own (plan
strip, system notices, startup error banner, ACP approval cards).

Renderers we wrote earlier all keep their place:
  - Markdown.tsx → injected as `Text` part component
  - ToolCards.tsx → injected as `tools.Override` so per-kind cards
    (read/edit/execute/search/…) render inside assistant messages
  - cockpitRattle (verbs + braille frames) → driven by the new
    `<ThreadPrimitive.If running>` gate
  - ApprovalCard.tsx → rendered below the message stream as before

Composer behaviour:
  - `<ComposerPrimitive.Input>` replaces our textarea; auto-grow + the
    wterm focus-race reclaim logic still attach to its element ref
  - Send/Stop swap via `<ThreadPrimitive.If running>`; Stop calls
    `useThreadRuntime().cancelRun()` which the runtime adapter
    forwards to our `cockpit/cancel` REST endpoint
  - `onNew` flattens AppendMessage parts to plain text (ACP only
    accepts text prompts); attachments/images dropped silently for now

Verified: e2e (13/13 steps), spinner mid-turn snapshot still shows the
empire-themed rattle, focus reclaim still beats wterm's async init,
all 5 cockpit_acp_smoke tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lift the composer out of CockpitView.tsx into its own file and rebuild
the chrome around assistant-ui's `<ComposerPrimitive>` to match the feel
of VSCode chat and Cursor agent chat.

What changed visually:
  - Tall multi-line input by default (rows=2, min-h-14, max-h-50) so
    the composer reads as a writing surface, not a one-liner. Auto-
    grows up to 200px before scrolling.
  - Top-affordance toolbar row inside the composer card with `@`
    files, `/` commands, and paperclip attachment icons (lucide-react,
    same icon family VSCode/Cursor visually feel like). Disabled with
    "coming soon" tooltips for now — present sets the visual frame.
  - Send button: paper-plane icon on a brand-amber rounded pill, with
    a hover lift and 0.98 active-scale so it feels press-able.
  - Stop button: square-stop icon + "Stop" text, hover styling tinted
    rose to read as a destructive/cancel action.
  - Keyboard hint as a kbd-styled chip ("↵ Send") next to the send
    button instead of dim text floating below.
  - Subtle inner shadow on the composer well + amber focus ring (3px
    glow) on focus-within. Clear visual hierarchy: input > toolbar >
    actions.

Lifted out of the view so it can grow features (model picker, file
chips, slash command popover) without cluttering CockpitView.tsx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tool cards

Closes the eight remaining naive UI items in one pass.

Backend (new endpoints + ACP set_mode):
  - GET  /api/sessions/:id/cockpit/files  → workspace file list (5k cap,
    skips .git/node_modules/target/dist/build/.next/.venv/.cache/etc.)
  - POST /api/sessions/:id/cockpit/mode   → ACP `session/set_mode`
    via new ClientCmd::SetMode + AcpClient::set_mode + Supervisor::
    set_mode

Frontend additions:
  - TriggerPopover.tsx: generic @-/-trigger combobox. Detects trigger
    chars at word boundaries (whitespace before, word chars after),
    arrow-keys/Enter/Tab/Esc navigation, mousedown-insert so the
    textarea doesn't lose focus. Plugged into the composer twice —
    once for @ (file picker, fed by /cockpit/files + fuzzyFilter)
    and once for / (slash commands, hard-coded /help/clear/tools/model
    for now).
  - ModePicker in PlanStrip: clickable mode chip with Default/Plan/
    AcceptEdits/Yolo (BypassPermissions) options, each with a one-line
    hint. Click → POST /cockpit/mode. Tinted by current mode (rose
    for Yolo, amber for AcceptEdits, cyan for Plan).
  - Plan strip itself: progress bar (visual, animated transition),
    completed/total counter, chevron rotation on expand, and a thin
    always-visible bar with the mode picker even when no plan is
    active.
  - Hover affordances on messages via ActionBarPrimitive: copy + edit
    on user messages, copy + regenerate on assistant. Visible on
    hover/focus only (group-hover:opacity-100), Lucide icons.
  - useStreamReveal hook: char-budget reveal (24 chars/16ms baseline,
    accelerates when >200 chars behind). Smooths ACP's chunky text
    delivery into typewriter-style streaming.
  - Composer toolbar: @ and / icon buttons now insert the trigger char
    at the caret (with space-padding when mid-word) so the popover
    opens. Paperclip remains a coming-soon stub.

ToolCards.tsx rewrite — VSCode/Cursor-style:
  - Common CardChrome with status dot + per-kind icon + label + meta
    + collapse chevron.
  - Bash: $-prefixed command, output highlighted as `bash` lang in
    shiki, line count in meta, "Show N more" expand for long output.
  - Read: file path + line range + line count, content highlighted
    by extension, 16-line preview default.
  - Edit: real diff lines with +/− gutter, shiki-highlighted body,
    +N/−N counters in meta.
  - Search: query + scope, line-numbered match list capped at 50.
  - Fetch: URL primary, JSON-highlighted output.
  - Think: one-line italic, no chrome.
  - Generic fallback: input + output sections with copy buttons.

ApprovalCard rewrite:
  - Aligned with tool-card visual language: same border-md, same
    bg-surface-800/50 base, header strip with shield/alert icon and
    label.
  - Three-button block: Allow / Always (benign) or Hold-to-allow
    (destructive), plus Deny on the right. Lucide icons throughout.
  - Args preview lives in a max-h-32 scrollable code block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces I shipped from scratch that were already in the toolbox.
Net code: similar line count after factoring out package-lock churn,
but ~520 lines of bespoke logic deleted in favor of battle-tested
upstream code that the assistant-ui team maintains.

1. TriggerPopover.tsx (287 lines) → ComposerPrimitive.Unstable_TriggerPopover
   I had the exports printed out earlier and skipped over the
   Unstable_ family. The official primitive ships:
     - Trigger detection at word boundaries
     - Arrow-key/Enter/Tab/Esc navigation with data-highlighted
     - Mousedown insertion (so the textarea doesn't lose focus)
     - Plugin registry integration with ComposerInput so cursor
       position is wired automatically
     - Search vs categories drill-down with isSearchMode
   We provide an Unstable_TriggerAdapter (categories/categoryItems/
   search) that returns items for @ files / / commands. Pairing the
   `@` trigger with .Directive (chip-into-text) and the `/` trigger
   with .Action (handler-fires-immediately) gets us both UXes
   declaratively. Empty `categories` skips the drill-down step so
   a flat file list shows the moment `@` is typed.

2. Markdown.tsx (~200 lines + useStreamReveal hook) → MarkdownTextPrimitive
   from @assistant-ui/react-markdown (already in deps, never imported).
   The primitive handles:
     - Streaming-aware rendering (incomplete fenced blocks)
     - Built-in `smooth` char-budget reveal (replaces useStreamReveal)
     - Standard markdown via remark/rehype
   We plug in a SyntaxHighlighter component backed by our existing
   shiki integration, plus a CodeHeader matching the design tokens.
   index.css gets descendant selectors for prose because the primitive
   emits unclassed elements.

3. EditToolCard custom +/- diff renderer → react-diff-viewer-continued
   The library does word-level diff, line numbers, gutter colors,
   expand-context. We override its dark theme variables to match
   our zinc/brand palette so the diff doesn't read as "pasted in
   from another app". Diff lib is loaded only inside the Edit
   card, so the bundle penalty stays in the tool-card chunk.

Composer also gains the proper plugin-registry path: the
TriggerPopoverRoot wraps ComposerPrimitive.Root, so the input fires
setCursorPosition() into all registered triggers automatically.
Both `@` and `/` use the same primitive, just with different adapters
and behaviors (Directive vs Action).

The custom code that stays:
  - CockpitRuntime.tsx (ACP-specific, not generic)
  - useFilesIndex (one-shot fetch + memo)
  - fuzzyFilter (small enough to not warrant a dep)
  - The ApprovalCard, ModePicker, and PlanStrip remain bespoke
    because they're product-specific (ACP approvals, ACP modes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
njbrake and others added 3 commits April 30, 2026 18:36
Three issues from real-Claude usage:

1. Bash tool card showed literal "$ {}" when raw_input was empty.
   Forward the ACP `tool.title` through CockpitRuntime via a namespaced
   `_aoe_title` key in the args JSON so per-kind renderers can fall
   back to a descriptive label. Updated Execute/Read/Edit/Delete/
   Search/Fetch cards to chain: real arg field → forwarded title →
   bare tool kind.

2. Mode picker said "Default" even when the agent was running in
   yolo mode. Root cause: we never read agent-advertised modes —
   only listened for a CurrentModeUpdate event we never received
   because the ACP `session/update` for mode change is rare. Now:
     - Capture the mode set from `NewSessionResponse.modes` on
       session creation and emit a new `Event::ModesAvailable`
       carrying the agent's actual modes (id + name + description).
     - Map ACP `SessionUpdate::CurrentModeUpdate` to a new
       `Event::CurrentModeChanged` in addition to the legacy
       enum-based `ModeChanged`.
     - UI tracks `state.availableModes` + `state.currentModeId`
       and renders the picker from those (falls back to the
       hard-coded four-mode taxonomy when the agent doesn't
       advertise any).

3. Mode picker also moved from the top PlanStrip strip into the
   composer footer (Cursor-style: inline with @ / / / paperclip
   toolbar buttons, opens upward via `bottom-full`). PlanStrip is
   now hidden entirely when there's no plan and mode is Default,
   instead of rendering an empty bar to host the picker.

4. Cockpit-mode session in the TUI showed "tmux pane is gone"
   because `Instance::start_with_size_opts` unconditionally
   created a tmux session. Cockpit sessions don't have a tmux
   backing — the supervisor spawns the ACP agent process directly.
   Short-circuit start() for cockpit_mode at the top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r (Beta)

Make cockpit a real per-session opt-in across every aoe-supported
agent that has a published ACP server, not just Claude.

Default registry (`AgentRegistry::with_defaults()`) now seeds one
entry per tool, keyed by the same name the tmux substrate uses, so
the spawn path can map `instance.tool` directly to a registry key:

    claude   → claude-agent-acp     (Zed adapter for Claude SDK)
    opencode → `opencode acp`       (native, SST)
    gemini   → `gemini --acp`       (native, Google)
    codex    → codex-acp            (Zed adapter, OpenAI Codex CLI)
    vibe     → vibe-acp             (native, Mistral)
    pi       → pi-acp               (adapter, Hermes coding agent)
    aoe-agent → bundled multi-provider fallback

Verified each invocation against agentclientprotocol.com/get-started/
agents.md and the upstream agent docs (Jan 2026). The legacy
"claude-code" key stays as an alias so persisted sessions with
cockpit_agent="claude-code" still resolve.

Spawn path (`Supervisor::pick_agent_for_tool`) replaces the
hard-coded `"claude → claude-code, else aoe-agent"` fallback in
three places:
    src/server/api/cockpit.rs   (POST /cockpit/spawn)
    src/server/api/sessions.rs  (auto-spawn after create)
    src/server/mod.rs           (auto-spawn at serve startup)

Precedence: explicit override → registry entry keyed on tool →
legacy fallback. So `aoe add . --cmd opencode --cockpit` now spawns
real opencode-via-ACP, not the generic aoe-agent.

`aoe cockpit doctor` walks the new registry and prints a per-agent
status with tailored install hints. `--fix` runs `npm install -g`
for the npm-distributed adapters (claude / codex / pi); native CLIs
get a one-line install hint pointing at the upstream installer.
Doctor banner now reads "Cockpit doctor (Beta)" with a one-line
explainer about substrate selection.

Web wizard gains an explicit two-card substrate picker (Cockpit Beta
vs Terminal Stable). Greys out cockpit when the selected tool isn't
in our `ACP_CAPABLE_TOOLS` allowlist (aider, cursor, copilot,
droid, settl, hermes for now). The wizard sends `cockpit_mode` on
the create-session request; server's `default_cockpit_for_web`
default still applies when omitted, but the wizard always sends it
explicitly now.

docs/cockpit.md gets a Beta callout at the top, a per-agent
support/auth table, and an updated doctor sample reflecting the
new format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
njbrake and others added 2 commits May 1, 2026 11:01
Two new endpoints + a discreet UI button per view make `cockpit_mode`
runtime-mutable. The data model was already a per-session bool; this
plumbs the actual transitions:

  POST /api/sessions/:id/cockpit/enable
  POST /api/sessions/:id/cockpit/disable

Both are idempotent (200 with no work when already in the target
state). Enable validates the tool has an ACP-capable registry entry
before flipping (so we don't strand a session in cockpit mode with no
agent to spawn). Persistence happens before the new substrate starts,
so a crash mid-swap leaves the declared end state on disk rather than
a half-broken intermediate.

Cockpit → tmux:
  - Supervisor::shutdown drops the ACP worker (UnknownSession is fine
    if startup never completed).
  - cockpit_mode = false, save.
  - Instance::start() creates a fresh tmux pane and runs the agent.

Tmux → cockpit:
  - Instance::kill() drops the tmux pane.
  - cockpit_mode = true, save.
  - Supervisor::spawn fires off a worker. If it fails (binary missing,
    auth missing, etc.) the standard AgentStartupError flows through
    to the UI's red banner — the swap itself returns 200 because the
    substrate state on disk is correct.

UI: new SwitchSubstrateAction component with a destructive-confirm
modal. Plugged into:
  - the cockpit composer toolbar (icon-only, next to mode picker)
  - the TerminalView top-right corner (icon-only, absolute-positioned)

The cockpit-side button always works; the terminal-side button greys
out when the tool isn't in our ACP_CAPABLE_TOOLS allowlist (aider,
cursor, copilot, droid, settl, hermes for now). Confirm dialog
explains what's destroyed (cockpit conversation log / tmux scrollback)
and what's preserved (worktree, open files, session id).

After the API returns the session-list poll picks up the new
cockpit_mode within ~3s and the parent flips between <CockpitView>
and <TerminalView>; the explicit refresh wasn't worth threading
through.

Verified end-to-end with a Playwright suite (13/13 pass): both
directions through the UI, idempotent re-enable/re-disable through
the API, backend/frontend stay in sync, confirm dialog gating works.
The pre-existing 14-scenario comprehensive suite still passes
(23/23 + 1 skip on the pre-existing replay-buffer item).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	web/src/components/TerminalView.tsx
#	website/scripts/sync-docs.mjs
@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 6, 2026

do you need some testing on this?

@njbrake
Copy link
Copy Markdown
Owner Author

njbrake commented May 7, 2026

@Seluj78 If you would have a chance, absolutely! This is the next generation feature I think to make the web mode of AOE great, but I haven't had the time to finish the feature.

@njbrake njbrake added this to the Cockpit Mode milestone May 7, 2026
@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

oooh fancy ! I like the new UI!

Screenshot 2026-05-07 at 11 18 15

Commands to run to get it working:

cargo build --features 'serve cockpit' --profile dev-release
cd web && npm install && npm run build && cd ..
./target/dev-release/aoe add . --cmd claude --cockpit
./target/dev-release/aoe serve

Here are my feedbacks so far:

  • I was not expecting this, but now that I see the new central window, I feel like it's weird to have the terminal on the right side to still be a tmux.
  • These kind of interfaces are amazing but I am always reluctant in using them compared to claude's CLI because it might be missing features, like the AskUserQuestion tool for example, or other tools that might be added by the agents AOE supports that then need a downstream update to support them afterwards

I asked a suggested question and the right side isn't spinning
Screenshot 2026-05-07 at 11 22 22

After a bit more investigation, it looks like the session never starts
Screenshot 2026-05-07 at 11 23 27

@njbrake
Copy link
Copy Markdown
Owner Author

njbrake commented May 7, 2026

These kind of interfaces are amazing but I am always reluctant in using them compared to claude's CLI because it might be missing features, like the AskUserQuestion tool for example, or other tools that might be added by the agents AOE supports that then need a downstream update to support them afterwards

💯 this is my concern too. My hope 🤞 was that by using ACP we would have to worry about less of this and things would just work https://github.com/agentclientprotocol/agent-client-protocol .

Thank you for giving it a try. One of the biggest hurdles I'm trying to figure out is how to manage a session between TUI and Web in cockpit mode. Like, in the web dashboard I certainly want it using cockpit mode, but then if I go to use the TUI I think I would rather have it use the tmux+native Claude Code etc mode. So is there a way where I can run a single session and have it be able to switch between cockpit and tmux views. Idk 🤷

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

💯 this is my concern too. My hope 🤞 was that by using ACP we would have to worry about less of this and things would just work agentclientprotocol/agent-client-protocol

Oooh that's what this is ! I didn't bother checking it out earlier. It makes so much sense! Then yeah 100% this is the right way to go, my concern has disappeared. The only thing I would keep in mind is a way to know when new versions of the SDK is released so we know that we need to update AOE to support it (a CI check?)

Thank you for giving it a try. One of the biggest hurdles I'm trying to figure out is how to manage a session between TUI and Web in cockpit mode. Like, in the web dashboard I certainly want it using cockpit mode, but then if I go to use the TUI I think I would rather have it use the tmux+native Claude Code etc mode. So is there a way where I can run a single session and have it be able to switch between cockpit and tmux views. Idk 🤷

From what I saw in the webUI (I didn't try the TUI in this PR) you had a toggle to switch back to tmux. So I don't understand right now the problem you're having :)

@njbrake could you update this PR (43 commits behind) and I'll try to do some sessions on it and see how it feels and give more feedbacks :)

njbrake and others added 3 commits May 7, 2026 12:48
The cockpit (ACP-based structured rendering) only ships alongside the
web dashboard, so the standalone `cockpit` cargo feature was redundant.
Consolidating means one feature flag for "I want the web surface" and
no risk of someone enabling cockpit without realising they need serve.

- Drop the `cockpit` feature from Cargo.toml; roll its deps
  (agent-client-protocol, agent-client-protocol-tokio, tar, xz2) into
  the existing `serve` feature list.
- Sweep `#[cfg(feature = "cockpit")]` → `#[cfg(feature = "serve")]`
  across 44 sites in src/ and tests/.
- Update docs/cockpit.md and the one stale comment in
  tui/settings/fields.rs to reference `serve`.
- Fix three pre-existing test fixtures missing the `kind` field on
  ToolCall (uncovered now that cockpit tests run with `--features
  serve` instead of being routed to a separate feature build).

Cockpit code still lives entirely under src/cockpit/, so the
rip-out story is unchanged: delete the module + its handler files
in src/server/, and the rest of `serve` keeps working.
Cockpit sessions spawned a `claude-agent-acp` subprocess (plus its SDK
child) that lived on past two cleanup paths:

1. `DELETE /api/sessions/{id}` ran `perform_deletion` (worktree + tmux
   teardown) but never told the cockpit supervisor to shut down. The
   worker handle stayed in `Supervisor::workers`, the spawned process
   stayed alive.

2. `aoe serve` graceful shutdown handled SIGINT/SIGTERM/SIGHUP for the
   daemon itself but never called `Supervisor::shutdown_all`, so each
   running cockpit session leaked its wrapper + SDK child when the
   daemon exited.

After repeated probe runs we'd accumulate 6+ orphan node processes per
session. Wire the supervisor shutdown into both paths; verified end-to-
end that two cockpit sessions drop their processes to zero on delete,
and a SIGTERM to the daemon reaps the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

@njbrake is there specific things you need tested in this PR ? Or just general feedback on the experience of using it? Or something else ? :)

@njbrake
Copy link
Copy Markdown
Owner Author

njbrake commented May 7, 2026

@Seluj78 yes at this point just general feedback. There are a few dimensions we need to test/debug before merge.

  1. How does the web app behave? Is it a smooht experience and can you easily view both cockpit sessions and non-cockpit sessions?
  2. How does the TUI handle the cockpit sessions and can you cleanly convert a session between cockpit and tmux modes
  3. What is the risk? Is there any impact to the TUI? The TUI is stable and needs to stay that way, the Web App is beta/experimental so I'm less worried about lil rough edges on the web app that we can smooth out in follow up prs

Basically I don't need this PR to handle all the edge cases but I need to be sure that the web app is at least generally as usable as it was before the change, and that it doesn't regress anything in the TUI to degrade the TUI experience

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

@njbrake okay, well I can't go any further in my testing because I cannot get claude to respond to me in the cockpit mode
but switching back to tmux mode and asking the same question works

Actually I wanted to reproduce the issue above and I can't even get a new session to show up:

Screenshot 2026-05-07 at 18 04 52

I ran those commands and it said session started but nothing happened in the webui.

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

Comment from Claude (Opus 4.7), posted by @Seluj78 — investigating the "session started but nothing in the webui" report.

Traced the flow. Two-tier bug:

1. aoe session start is a no-op for cockpit sessions.

Instance::start_with_size_opts (src/session/instance.rs:797-804) returns Ok(()) immediately when cockpit_mode is true:

#[cfg(feature = "serve")]
if self.cockpit_mode {
    return Ok(());
}

The CLI prints ✓ Started session: ... regardless, so it looks like it worked.

2. Cockpit workers only auto-spawn at aoe serve startup or via REST.

ACP worker spawn happens in three places:

  • aoe serve startup scan (src/server/mod.rs:471-511) — runs once, over sessions that exist at that moment.
  • POST /api/sessions (web-UI create) — src/server/api/sessions.rs:745-767.
  • POST /api/cockpit/sessions/:id/enable (substrate switch) — src/server/api/cockpit.rs:344-356.

Nothing watches for cockpit sessions added after aoe serve is already running. The 2 s status_poll_loop (src/server/mod.rs:1218) reloads instances from disk, so the new session does appear in GET /api/sessions, but no code path calls cockpit_supervisor.spawn(...) for it.

Repro of the exact flow in the report:

Term A: aoe serve                              # daemon up, 0 cockpit sessions known
Term B: aoe add . --cmd claude --cockpit       # written to disk only
Term B: aoe session start Ethiopians           # no-op (cockpit early return)
                                               # 2s later: poll reloads disk, session
                                               # appears in webui list, but no worker
                                               # was ever spawned → agent silent.

This matches both screenshots in the comment above: "claude doesn't respond" (worker missing for an existing cockpit session) and "session never starts" (worker missing for the freshly-added one).

Fix options:

  1. Daemon-side reconciler (recommended). Extend status_poll_loop to track attempted spawns and, on each tick, call supervisor.spawn for any cockpit session on disk that has no running worker and hasn't been attempted yet. Idempotent, mirrors the startup auto-spawn, fixes both aoe add --cockpit while serve is running and any race where serve starts before a session is fully written. Need a "already attempted" set so a permanently-failing spawn doesn't retry every 2 s — supervisor already has restart bookkeeping for in-process crashes, but the initial spawn is currently a one-shot.
  2. CLI → daemon IPC. aoe session start for cockpit POSTs to the running daemon. Heavier (needs daemon discovery + auth token plumbing) and only fixes that one entry point — aoe add --cockpit while serve is already up still wouldn't trigger a spawn.
  3. Loud CLI error. aoe session start on cockpit prints "cockpit sessions are managed by `aoe serve`; restart serve to spawn the worker." Cheap, but the UX is bad and the underlying mismatch stays.

Going with #1 unless you'd rather a different shape — happy to push a fix on top of this branch.

@njbrake
Copy link
Copy Markdown
Owner Author

njbrake commented May 7, 2026

@Seluj78 thanks! I'm good with you pushing, maybe you need to create a branch in your fork to be able to do this though.

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

@Seluj78 thanks! I'm good with you pushing, maybe you need to create a branch in your fork to be able to do this though.

Yes, cause I'm not a maintainer in this repo, I'll make a branch in my fork that targets the native branch here :) gimme a few

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

@njbrake #953 is ready to review, it fixes a few different bugs. Let me know there if you disagree with some choices!

@Seluj78
Copy link
Copy Markdown

Seluj78 commented May 7, 2026

(one this PR is merged, or at least in a working state with my PR merged into it I will start using AOE to work on AOE 😉 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants