fix(cockpit): auto-spawn ACP workers for sessions added while serve runs#953
fix(cockpit): auto-spawn ACP workers for sessions added while serve runs#953Seluj78 wants to merge 3 commits intonjbrake:nativefrom
Conversation
Cockpit-mode sessions added via `aoe add . --cmd <tool> --cockpit` while `aoe serve` was already running never got an ACP worker. The dashboard showed the session (the 2s status poller picks it up from disk) but the agent was silent because no code path called `cockpit_supervisor.spawn(...)` for it. `aoe session start <name>` looked like it worked but is a no-op for cockpit sessions: `Instance::start_with_size_opts` returns early on `cockpit_mode == true`. Worker spawning previously fired only at three entry points: serve- startup auto-scan, `POST /api/sessions` (web wizard create), and `POST /api/cockpit/sessions/:id/enable` (substrate switch). Nothing watched for cockpit sessions appearing in the on-disk list after startup. Move the auto-spawn out of the one-shot startup block and into the existing `status_poll_loop`. Each tick reconciles cockpit-mode sessions on disk against the supervisor's worker map and spawns any that are missing. An `attempted_cockpit_spawns` HashSet guards against retry storms when a spawn permanently fails (e.g. `claude-agent-acp` not installed) and is pruned to the live id set each tick so a delete + recreate of the same id can spawn again. Cold startup latency is unchanged: tokio interval's first tick fires immediately, so the reconciler runs at the same point the old startup loop did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…'ing `aoe session start <name>` (and stop/restart/attach) on a cockpit-mode session previously printed `✓ Started session: ...` while doing nothing: `Instance::start_with_size_opts` returns early on `cockpit_mode == true` because cockpit sessions aren't backed by tmux. The CLI never checked the flag, so the success line was misleading and the agent stayed silent until the user noticed (or the daemon-side reconciler kicked in). Bail loudly from the CLI with the actual remediation: cockpit lifecycle is owned by `aoe serve`'s supervisor; use the dashboard or REST API to control individual sessions, or restart serve to force a re-spawn. `aoe session restart --all` skips cockpit sessions instead of erroring on each one, so the batch command stays usable in mixed-substrate profiles. Both `cockpit_mode` field and the `bail_if_cockpit` helper are gated on the `serve` feature; TUI-only builds compile to a no-op shim and behave exactly as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed a follow-up commit ( Two commits total now — happy to split into a separate PR if you'd rather review them independently. 🤖 Generated with Claude Code |
The post-add summary printed the same `aoe session start <name>` / `aoe` (TUI attach) hint regardless of substrate, which sent cockpit users straight into the lifecycle commands that now bail (`fix(cli): refuse cockpit lifecycle commands ...`). Tmux next-steps were also flat-out wrong for a cockpit session that has no pane to attach to. For cockpit sessions, point at `aoe serve` + the dashboard. For `--launch`, surface that the flag is a no-op so the user understands why the run didn't open a terminal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Converted back to draft because I'm still fixing bugs :D But I'm having loads of fun |
| #[cfg(feature = "serve")] | ||
| fn bail_if_cockpit(inst: &crate::session::Instance, verb: &str) -> Result<()> { | ||
| if inst.cockpit_mode { | ||
| bail!( | ||
| "cockpit sessions are managed by `aoe serve`; \ | ||
| cannot `aoe session {verb}` from the CLI.\n\ | ||
| The ACP worker is auto-spawned within ~2s of `aoe add --cockpit` \ | ||
| while serve is running, or on next `aoe serve` startup.\n\ | ||
| To control a cockpit session, use the web dashboard or the REST API." | ||
| ); | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
So this is quite opinionated, and the "easy" way to do things. I went this way because the fix, according to claude, to do it properly was at least 200 more lines. Should I do a follow up issue for this?
There was a problem hiding this comment.
Hmm this means that cockpit would be a web dashboard only feature to start? I think in order to keep the PR tractable I'm ok with that as long as in the short term we have some sort of icon in the TUI to tell someone that a session exists but must be loaded via the web dash
There was a problem hiding this comment.
That works for me, or I can do the long fix that reconciles correctly the cockpit sessions to the TUI. You're the boss boss
There was a problem hiding this comment.
(just tell me what you want me to do)
There was a problem hiding this comment.
lol. I think to make this less risky lets just say that cockpit sessions are only started outside the TUI, and that in the TUI it should track that the session exists, but it should mark it as 'cockpit session' and not let you attach to it. Then we'll worry about that piece later
There was a problem hiding this comment.
Let me know if you want to do any more cleanup now that we're working under this assumption, or if you want me to merge this into my cockpit pr branch as it is now 🙏
Description
Stacks on top of #868 (the
nativebranch). Fixes the "session started but nothing in the webui" report from #868's review thread, plus the surrounding silent-failure UX cliffs that surface from the same root cause.1. Daemon-side reconciler —
d8c21bfBug. Cockpit-mode sessions added via
aoe add . --cmd <tool> --cockpitwhileaoe servewas already running never got an ACP worker. The dashboard showed the session (the 2s status poller picks it up from disk) but the agent stayed silent because no code path calledcockpit_supervisor.spawn(...)for it.aoe session start <name>looked like it worked but is a no-op for cockpit sessions:Instance::start_with_size_optsreturns early oncockpit_mode == true.Worker spawning previously fired only at three entry points:
src/server/mod.rs)POST /api/sessions(web wizard create)POST /api/cockpit/sessions/:id/enable(substrate switch)Nothing watched for cockpit sessions appearing in the on-disk list after startup.
Fix. Move auto-spawn out of the one-shot startup block and into the existing
status_poll_loop. Each tick reconciles cockpit-mode sessions on disk against the supervisor's worker map and spawns any that are missing.attempted_cockpit_spawns: HashSet<String>guards against retry storms when a spawn permanently fails (e.g.claude-agent-acpnot installed).tokio::time::interval's first tick fires immediately, so the reconciler runs at the same point the old startup loop did.2. CLI bail for cockpit lifecycle commands —
ca4c112aoe session start <name>(andstop/restart/attach) on a cockpit-mode session previously printed✓ Started session: ...while doing nothing, becauseInstance::start_with_size_optsearly-returns oncockpit_mode. The CLI now refuses these commands with a message pointing ataoe serve+ the dashboard.aoe session restart --allskips cockpit sessions instead of erroring on each one.cockpit_modeis gated on theservefeature, so TUI-only builds compile to a no-op shim.3.
aoe add --cockpitnext-steps redirect —ed0c90fThe post-add summary printed the same
aoe session start <name>/aoe(TUI attach) hint regardless of substrate, sending cockpit users straight into the lifecycle commands that now bail. The cockpit branch now points ataoe serve+ the dashboard. For--launch, surface that the flag is a no-op so the user understands why the run didn't open a terminal.4.
aoe add --cockpitprecondition check —1bf3eb6Persisting a
--cockpitsession whose ACP adapter binary isn't on$PATHused to fail silently end-to-end:aoe addreturned 0, the dashboard listed the session, the supervisor's reconciler tried to spawn the worker,AcpClient::spawnerrored with "No such file or directory", and the user only learned something was wrong when the first prompt POST returned 404 ("session has no running cockpit").Verify the resolved adapter binary is on
$PATHat add-time and bail loudly with the install hint, the bundled-fallback (--agent aoe-agent) suggestion, and the tmux-passthrough escape hatch (--no-cockpit). Only sessions the user explicitly opted into cockpit for hit this check; the implicit default-for-claude branch keeps falling back to tmux, so users without cockpit tooling on$PATHaren't suddenly blocked.Drive-by fix in
command_present(src/cli/cockpit.rs): the placeholder branch was shadowed by the/-branch, so any agent whose command embeds a${aoe_data_dir}/...placeholder (notablyaoe-agent, our bundled fallback) was reported as missing inaoe cockpit doctorand would have failed this new precondition. Reorder the checks so placeholders resolve at runtime as documented.PR Type
Checklist
cargo test --features serve --lib server::tests16/16cargo test --features serve --lib cockpit39/39cargo build --features serve --profile dev-release,cargo buildcargo clippy --features serve -- -D warnings,cargo clippy -- -D warningscargo fmt --checkaoe cockpit doctoroutput is now correct for placeholder-based agents)AI Usage
AI Model/Tool used: Claude Opus 4.7 via Claude Code
Any Additional AI Details you'd like to share:
The original investigation comment on #868 was authored by me with Claude's help; this PR implements option #1 from that comment (commit 1) and option #3 from the same comment, expanded to cover the surrounding silent-failure UX cliffs (commits 2-4). I read every change and verified the build/clippy/fmt/unit-tests locally.
Test plan
cargo build --features serve --profile dev-releaseaoe serverestart.claude-agent-acpon$PATH(validates commit 4):aoe session start <name>andaoe session restart <name>(validates commit 2). Confirm both bail with the "managed byaoe serve" message.aoe session restart --allin a profile mixing cockpit and tmux sessions (validates commit 2). Confirm the cockpit ones are skipped silently and tmux ones restart.aoe add . --cmd claude --cockpit(validates commit 3). Confirm the next-steps block points ataoe serve+ the dashboard, notaoe session start.aoe cockpit doctor(validates the drive-by). Confirmaoe-agentis reported as[OK].Out of scope
A separate (pre-existing) sidebar bug surfaced while testing this PR: when multiple sessions share the same
(project_path, branch)key,useWorkspacescollapses them into one workspace and the sidebar renders onlyworkspace.sessions[0]. Reproduces with tmux-only sessions too. I'll open a separate issue.🤖 Generated with Claude Code