Add agentic-browser-fleet: many agents drive many browsers with atomic ownership#201
Add agentic-browser-fleet: many agents drive many browsers with atomic ownership#201minhtrinh-imbue wants to merge 31 commits into
Conversation
…owser-use agent
A new dockview tab ('+' -> 'New browser', replacing 'New URL') that spins up a
headless Chromium inside the compute and streams it to the tab, with a chat that
can hand the same browser to a browser-use AI agent.
Streaming/input use the Chrome DevTools Protocol (steel's mechanism): a Playwright
observer attached over CDP runs Page.startScreencast (JPEG frames over a WebSocket)
and injects mouse/keyboard via Input.dispatch*Event -- low-latency, display-
independent, no Xvfb/ffmpeg. browser-use drives the same Chromium for AI tasks.
Control: while the agent runs the status shows 'Agent has control' and human
browser input (incl. tabs) is locked; a message typed during a run is queued
(one pending, cancelable) and runs after; 'Take control' stops the agent and
returns control (no resume -- send a new message to continue). The view follows
the agent's active tab. Concurrent sessions capped (BROWSER_MAX_SESSIONS, default 3).
New lib libs/browser (FastAPI service at /service/browser/) + the dockview menu
item, gated on a resolvable ANTHROPIC_API_KEY.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
- Drop the Imbue Cloud / litellm proxy path entirely: resolve a direct Anthropic API key only (no ANTHROPIC_BASE_URL), and update the gating messages. - Rename 'Take control' -> 'Interrupt & take control'. - Make take-control one-click immediate: flip control to the human first (UI unlocks at once) then hard-cancel the run task, instead of waiting out browser-use's cooperative stop (which only lands at the next step boundary, seconds away if mid-LLM-call). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…er each Turn the single live-browser service into a per-workspace fleet of headless Chromium browsers with one compare-and-set ownership state machine, plus an `agentic-browser-fleet` CLI for agents to drive them and a viewer-only UI. - session.py: one `_transition` writer per browser (CAS under a lock; the displaced run is cancelled outside the lock so a human take-control can't deadlock). State = controller + owner_agent_id + human_pinned. Agents never preempt each other -- a second agent monitor-and-waits in a FIFO queue. Human take-control always wins and ends the agent's task; the agent resumes only via explicit --reclaim. Monotonic browser ids (0 = default), never reused. Ownership is bound to the live task connection (no stuck locks). - runner.py: /browsers fleet endpoints; a streaming task endpoint with acquire-or-wait, disconnect-release, and a max-step/wall-clock backstop; /hold backs the lock verb. - fleet.py: the `agentic-browser-fleet` CLI (ls/new/task/lock/unlock/release). Discovers the daemon via applications.toml; streams the browser-use thinking/action trace to the calling agent's own output. - index.html: viewer-only (in-tab chat removed) with an "Agent has control" overlay, take-control, return-to-agents, and a dead-session state. - layout: a `service:browser?session=<id>` pane ref + the "+" menu lists and focuses active browsers. - agentic-browser-fleet skill, per-project changelogs, and tests for every load-bearing case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
A free (unpinned-human) browser was labeled "(you took control)" in the "+" menu; that wording is for a human who explicitly took control. Show "(free)" for an available browser, matching the CLI's owner labels. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ser itself) The agent now drives the browser one command at a time -- `state <id>` returns the page as a numbered list of clickable elements, then `open`/`click`/`input`/ `select`/`scroll`/`keys`/`screenshot`/`tab <id> ...` act on it -- instead of delegating a whole goal to an autonomous browser-use agent. This matches the upstream browser-use skill and is keyless: the agent does its own reasoning (already authenticated), and the browser commands are deterministic. - session.py lifts browser-use's own ActionHandler (the executor its CLI uses) against our held BrowserSession, so the live screencast + ownership wrap the direct commands. `state` caches the numbered selector_map; a `click` against a stale index returns a clear "run state first" instead of mis-clicking. - Ownership for direct control is a sticky lease: the first command acquires it, every command re-checks ownership (compare-and-set) right before acting -- so a human take-control mid-sequence makes the next command a clean "lost control" rather than touching the human's browser -- and an idle lease auto-releases (~90s) since short commands have no connection to bind to. - New CLI verbs (state/open/click/input/select/scroll/keys/screenshot/tab/ acquire) + `ls --include-tabs`; every response carries the owner so the agent knows if it lost control. `task` (delegation) is kept as an optional fallback. - SKILL.md rewritten to the direct-control model (the state->click loop, choosing a browser, re-querying after each change, ownership rules). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…keyless) The "+" menu's "New browser" item disabled itself and showed a "Browser sessions need an Anthropic API key" dialog whenever the workspace had no key. That gate was a leftover from the delegation model: direct control needs no key (the agent and the user drive the browser by hand), so a browser can always be started. Only the optional task/extract verbs need a key, and the daemon already checks those at call time. - DockviewWorkspace.ts: "New browser" always creates a browser; removed the dead browserKeyAvailable state and the gating dialog. - session.py: anthropic_key_status() now describes only the key-only task/extract verbs and no longer claims browser sessions in general need a key. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ip transparency
Direct control silently lost the auto-layout in the pivot: _pull_in_pane was only
wired into the old task/lock verbs, so driving a browser with state/click/...
left it headless. Plus a batch of viewer fixes the live feature surfaced.
- Auto-layout: run_action now reports newly_acquired (the first command that takes
a browser, and the first after a human hands it back). The CLI splits that browser
in as a pane to the RIGHT of the agent's chat -- chat left, browser right, one
pane per browser, fired once per acquisition. _pull_in_pane now splits (relative
to the BROWSER_FLEET_ANCHOR chat, else self) instead of falling back to `open`
(which only tabbed into the chat group). Splitting an already-open browser is a
no-op that focuses it, so re-acquire never duplicates panes.
- Uniform resolution: browser-use pins the viewport on the first tab only; tabs
opened later came up at different sizes and the viewer letterboxed them
inconsistently. _set_active_page now forces Emulation.setDeviceMetricsOverride to
the screencast size on every tab, so all tabs stream at the same resolution.
- Viewer nav: Back / Forward / Reload buttons left of the address bar (active only
while the human holds control); the daemon handles back/forward/reload cast
messages against the live page. Reload reloads the page, not the browser.
- Ownership transparency: the "Agent has control" overlay shows a live idle
countdown ("idle 12s, releases control in 78s") so a watching human knows when a
quiet agent's sticky lease auto-releases (90s idle-TTL), and lists agents queued
(monitor-and-wait) behind the owner. The waiting queue is also in GET /browsers
and `ls` ([queued: ...]). The keepalive loop broadcasts the control message each
tick while an agent holds, to drive the countdown.
Verified: 45 browser/fleet unit tests, plus real-Chromium integration tests for the
screencast (device-metrics override does not stall frames) and for newly_acquired.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
… trim viewer chrome The skill had no instruction to release a browser when finished, so an agent that was done still held it (grey "Agent has control" overlay) until the 90s idle TTL. And the sub-agent anchor guidance promised a path that cannot work. - Skill: tells the agent to `release <id>` the moment it is done with a browser, handing control straight back to the human instead of waiting out the idle timeout. The idle auto-release stays as the backstop. - Skill: replaced the "set BROWSER_FLEET_ANCHOR for a sub-agent" guidance with the reality -- a launch-task sub-agent runs in a separate, isolated container with no access to this workspace's browser fleet or its live panes (its agentic-browser-fleet calls hit a daemon/registry that isn't there, which is why a delegated browser's pane never anchored to the foreground chat). The agent now drives the browser itself, in the chat the user is watching; browser work belongs to the user-facing agent, not a background sub-agent. - UI: dropped the placeholder "web" example server from the "+" menu (the browser fleet is the real web surface), and removed the per-tab Refresh button from browser panes -- reloading the pane only reconnects the live view (it reads as "restart the browser"); the viewer's own in-page Reload button covers reloading the actual page. Verified: 354 frontend tests pass (rebuilt bundle). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ent waits
When you hold a browser, the bar now lists the agents queued to use it ("Agents
waiting to use this browser: ...") and shows "Return control to agents" ONLY when
one is actually waiting -- otherwise it reads "No agents are waiting" with no
button, since there is nobody to hand back to (the next agent that wants the
browser just takes it).
Skill: states the fleet cap (5; `new` past it returns "Too many open browsers",
release/close one first) and the "another browser vs. wait" rule -- another agent
holds it, take a different browser; a human took YOUR browser mid-task, wait and
resume that same one.
This is the safe, well-specified slice of the queue/handoff paradigm. The
daemon-side event-driven resume (resume-after-handoff lane, mngr-message wake,
claim window, auto-give) lands next as a focused, tested change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…me on hand-back Before, a human taking a browser an agent was driving just stopped the agent; the user had to re-type "keep going". Now it is a handoff: - An agent whose direct command is rejected because a human (or another agent) holds the browser is added to a resume queue (it ends its turn). When the browser frees, the daemon hands it to the first queued agent and messages it (via `mngr message`, the same path launch-task uses) to resume -- it re-reads the page with `state` and continues. This is the CAPTCHA / login handoff flow. - A human pin only *blocks* agents while the human is actively driving. If they go quiet for BROWSER_HUMAN_ACTIVE_GRACE (default 20s), the pin yields: a queued agent is handed the browser by the keepalive sweep, and a freshly arriving agent simply takes it. A forgotten hold never blocks the fleet. An actively-driven pin keeps refreshing and is never yanked. - An agent handed the browser from the resume queue but that never sends a command (interrupted/killed) has the grant revoked after BROWSER_CLAIM_WINDOW (default 12s), so the browser passes to the next waiter instead of idling on a no-show. - busy_human now exits 2 (preempted: stop and wait for the wake), not 3 (generic busy). busy_agent still exits 3 but also queues the agent to be woken. The resume queue is surfaced in the same `waiting` list the viewer bar and `ls` already show. Skill: take-control guidance is now "tell the user, end your turn, you'll be messaged to resume; re-run state first". Release rules spelled out by case (task done, user says stop, switching browsers; keep multiples only while actively driving). Verified: 49 browser/fleet unit tests (4 new: enqueue-on-busy + wake, stale-pin yields, stale-pin persists when nobody waits, unclaimed-grant revoke) + 14 ratchets. A real-Chromium repro confirms enqueue-on-busy / newly_acquired / busy_agent / clean shutdown. (The pytest real-Chromium integration harness is flaky/slow in this env; the same test passed at ~5s earlier in the session.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ost wakes, etc.) A max-effort multi-agent review of the resume-queue state machine surfaced several real concurrency/correctness bugs in the just-added handoff; all are fixed here. - Double-queue spurious re-grant: an agent can be in both the resume queue (rejected direct command) and the connection-bound wait queue (`task`/`acquire --wait`). Granting it from the wait queue now also clears its resume-queue entry, so a later release no longer re-grants the freed browser to an agent that already finished and fires a spurious "handed back to you" wake. (Regression test added.) - Lost wakes: `_wake_agent` was scheduled with a bare `asyncio.create_task`, which asyncio holds only weakly -- the wake (an `mngr message` subprocess) could be GC'd before it ran, stranding the queued agent until the claim window. Wake tasks are now retained in a set until they complete. - The explicit `acquire <id>` HTTP endpoint did not pass `enqueue_on_busy`, so the CLI told the agent "you're queued ... I'll message you when it frees" while the agent was never actually queued -- it would wait forever. The endpoint now enqueues. - Human-active grace was 20s, short enough that a `task`/`acquire` could take a human-pinned browser mid-CAPTCHA once the human paused input (reading instructions, fetching a 2FA code). Bumped the default to 120s; any input refreshes it, and the "Return control to agents" button remains the instant hand-back. - The wake subprocess now runs with cwd set to a file-anchored repo root (not the daemon's inherited cwd) so the `mngr` dev shim resolves this checkout. - SKILL.md exit-code table corrected: a human take-control during a direct command is exit 2 (preempted: stop, end turn, you'll be messaged to resume), not exit 3; exit 3 is now "held by another agent". Verified: 64 browser/fleet unit tests + 14 ratchets (1 new regression test for the double-queue re-grant). Deliberately deferred review items: the serial page.title() in `ls` (efficiency, capped fleet), the two-queue abstraction (style), and the pre-existing "New URL" menu removal (from the original fleet commit, a product decision, not this change). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
When a browser's Chromium dies unexpectedly (OS/OOM kill, segfault) the daemon used to only find out lazily, returning a raw CDP exception, and the viewer froze on the last frame. Now it's a first-class state. - Detection: register a handler on the Playwright observer's `disconnected` event (fires when the CDP connection drops); our own close() sets _closed first so intentional teardown isn't mistaken for a crash. Backstop: when a direct command fails, classify it as a crash if the connection is gone (is_connected() is False), so it's caught even if the event hasn't fired yet. - Agent-facing: a command on a crashed browser short-circuits to status "crashed" with "browser N crashed (Chromium killed -- e.g. out of memory) and is gone. Start a fresh one with `new`" instead of trying to drive a corpse or leaking an exception. - Viewer: a distinct terminal "This browser crashed ... open a new browser from +" state, locked controls, and it ignores any late frames/control so it can't be overwritten. describe()/`ls` report crashed; the keepalive stops sweeping a dead one. - Lifecycle: a crashed id is never reused (new browser = new number, new tab), and crashed shells are excluded from the fleet cap so a crash never blocks `new`. Verified: 67 browser/fleet unit tests + 14 ratchets (3 new: crashed reported to agent+viewer, intentional close not flagged as crash, crashed excluded from cap), plus a real-Chromium integration test that kills Chromium out from under the session and asserts the crash is detected and reported (1 passed, ~7s). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
The DELETE /browsers/{id} endpoint existed but had no CLI verb. `close <id>` shuts an
entire browser down (all its tabs) and retires its id (never reused) -- distinct from
`tab <id> close`, which closes a single tab. (Profile-dir cleanup will hang off this
once persistent profiles land.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…fest) Close minds and come back: your browsers return -- same tabs, and still logged in. - Persistent Chromium profiles (Tier A): each browser gets its own user_data_dir under $MNGR_HOST_DIR/browser-profiles/ (the workspace volume), so Chromium itself persists cookies/logins/history. We serialize none of that by hand. The profile dir name contains the literal "browser-use-user-data-dir-" substring on purpose: it makes browser_use's _copy_profile() use the dir IN PLACE instead of copying it to a throwaway temp dir (the bundled binary is "Chrome for Testing", so its is_chrome check is true) -- which would silently defeat persistence. Verified against browser-use==0.13.1 and guarded by an integration tripwire. - Manifest (Tier B): a tiny runtime/browser-fleet.json (git-backed to the mindsbackup branch) records which browsers existed, their tab URLs, and the monotonic id high-water mark -- topology only, never ownership/queues (process- scoped) or profile bytes. Atomic write (tmp + os.replace). New module manifest.py. - Restore + init gate: on daemon startup the fleet is relaunched EAGER-SEQUENTIALLY (one browser at a time -- no cold-boot memory spike; fleet cap is 5) behind an init gate. State-changing routes return 503 "initializing" until restore finishes; read-only (ls/state/health) stay open; the cast socket is read-only during init. The gate ALWAYS opens (finally), even if restore raises -- never wedged shut. A fresh workspace seeds browser 0 at the home page; a manifest-loss-but-profiles- survived case relaunches the profiles rather than wiping logins; crashed browsers are never restored as healthy; restored ids never collide (high-water _next_id). - Checkpoints: topology changes (create/close) persist immediately; a periodic content-diff checkpoint catches tab-URL drift (idle workspaces write nothing); a final checkpoint on clean shutdown. close <id> forgets the profile dir; an orphan- profile sweep on startup bounds disk. - CLI maps 503 "initializing" to a clear "still starting up, try again" (exit 3); the viewer shows a brief "restoring your saved browsers" banner. Skill documents that browsers/logins persist and how to handle the brief initializing window. Design was produced by a multi-agent workflow (3 independent designs + 2 adversarial risk passes -> synthesized spec); that pass is what caught the _copy_profile trap. Verified: 87 fast tests (manifest roundtrip/atomic/corrupt; restore high-water / resting / first-boot / manifest-loss / orphan-sweep / skip-failing; snapshot excludes-crashed + topology-only; init gate blocks + poison-pill opens; close forgets profile + drops manifest; CLI 503 mapping) + 14 ratchets. The real-Chromium persistence test (cookie survives a manager restart + anti-_copy_profile tripwire) is written; it could not be run green locally because this machine's real-Chromium launch is currently congested (the pre-existing real-Chromium tests hang the same way) -- CI offload runs it in a clean sandbox. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…e review A max-effort multi-agent review of the persistence + dead-browser state machine found real durability/correctness defects in restore; all fixed here. - Transient relaunch flake no longer destroys a saved browser. Before, a browser whose Chromium failed to relaunch on boot was dropped from the manifest AND its profile rmtree'd -- a one-time hiccup permanently lost the user's logins. Now a flaked browser is KEPT in the manifest (retried next boot) and its profile is never swept; only profiles for ids we no longer want are removed. (Regression test added.) - restore() no longer holds the manager lock across all sequential Chromium launches. Each launch takes the lock briefly and releases between browsers, so a slow restore no longer blocks the read-only ensure_browser_0 / list / cast-connect paths for minutes (the init gate's "ls/state stay open" promise actually holds). - A crashed default browser 0 is recoverable again: ensure_browser_0 recreates it if the existing entry is a dead shell (reusing id 0's profile, so it returns logged in), restoring the "the default is always there on access" invariant. - close 0 no longer wipes browser 0's persistent profile (it's the recreated-on-demand default; wiping silently logged the user out when 0 came back). Only a retired non-default id forgets its profile. A manifest-write hiccup during close no longer 500s or skips the profile delete. - restore honors the manifest's active_tab (re-focuses the previously-active tab instead of leaving the last-opened one foregrounded). - The periodic checkpoint loop now starts on every startup path (including the waiting-for-chromium and restore-failed paths), so tab-URL drift is always persisted. - A read-only `state` peek at a busy browser no longer silently enqueues the agent as a resume waiter (only state-changing commands queue). On crash, the manifest is checkpointed promptly so an ungraceful kill right after a crash doesn't restore the dead browser as healthy. The checkpoint loop + final save now survive a transient Playwright/CDP error (catch _BROWSER_ERRORS, not just OSError). The manifest snapshot reads page.url without per-tab title CDP round-trips. Verified: 94 fast tests green (82 pure-Python incl. new crashed-0-recovers, state-doesn't-enqueue, flaked-browser-kept; 12 fake-browser integration incl. init gate, poison-pill, close-forgets vs close-0-keeps). Real-Chromium tests unrun locally (env congested); CI offload covers them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…y-agent message Three UX fixes from live use: - The viewer always shows a control indicator now. A fresh, AI-untouched browser (resting state) previously showed nothing; it now shows a persistent "You have control" bar whenever you can drive it, the same as after an explicit take-control. The "Return control to agents" button still appears only when an agent is queued. - Each browser surfaces as its OWN pane on the right (the layout split now passes --new-group), instead of being tabbed into an existing browser pane. Opening a second browser lands beside the first, not inside it. - A non-primary agent (a launch-task sub-agent or a second "+ New agent") can reach the fleet daemon over the network but can't drive this workspace's dockview layout. It used to wait 5s and print a confusing "service 'browser' is not registered ... still running headless" error. Now it skips the doomed layout attempt and says plainly: the browser is running, but only the primary agent shows browser panes (open it from the "+" menu). The misleading "headless" wording is removed. Verified: 84 fast tests (incl. 2 new: non-primary skips the pane-pull; primary uses --new-group). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…anes
Follow-ups from a multi-agent root-cause of "opening a new browser didn't show a
new pane":
- `new` now pulls the browser's pane immediately, so "open a new browser" visibly
opens one instead of showing nothing until the first `open`/`click`. Idempotent
with the first-command pane-pull (splitting an already-open pane just focuses it).
- Removed the dead `MINDS_BROWSER_SERVICE_URL` guard in `_pull_in_pane`: that env var
is never set anywhere in the tree, so the guard never fired -- and its premise
("non-primary agent can't show panes") was the wrong rule. Per the intended model,
ANY user-started agent (the primary OR one opened via "+ New agent") should surface
the pane next to its own chat; only launch-task/background agents can't. _pull_in_pane
now just attempts the split (anchor, else self) and, when it can't land (a background
agent with no chat in this UI, or the layout server unreachable), warns in one clean
line -- no raw 5s "service not registered", no misleading "headless".
- `_layout` gained a `quiet` mode so the pane-pull can substitute that clean message.
Note: the root cause of the reported case was a STALE workspace (created before the
`--new-group` fix); recreating the workspace picks up the new pane behavior. The
`--new-group` split itself was already correct.
Verified: 85 fast tests (incl. new: new pulls a pane; each browser its own pane;
clean warn when a pane can't be shown).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
Human take-control is now STICKY: it holds until the human explicitly hands back, with no idle/grace yield (removed _HUMAN_ACTIVE_GRACE / _sweep_stale_human_pin / _human_touched_at). A human who grabs a browser keeps it even if they walk away mid-CAPTCHA -- never force-yielded. Agents still auto-release via the idle lease; the asymmetry is deliberate. Agent-initiated handoff for CAPTCHAs / human verification: new `agentic-browser-fleet handoff <id> "<reason>"` (alias `request-human`). The agent jumps to the FRONT of the resume queue and control goes to the HUMAN, pinned (not the next queued agent), until they hand back -- then the requester resumes first. Viewer shows a distinct amber bar naming the agent + what to solve; the pane is surfaced. SKILL.md tells agents to use it (and NOT to solve CAPTCHAs themselves). Exit 2 (preempted) so the agent stops. Cross-modality sandbox: launch Chromium with the sandbox on by default (works under gVisor on docker/cloud/AWS) but auto-retry once with it off if a first launch fails for a sandbox reason -- so the fleet comes up unattended on Lima (plain-Linux VM) and non-gVisor VPSes (e.g. Vultr) too. BROWSER_NO_SANDBOX=1 forces it off. No provider sniffing. Tests: sticky-pin, resting-free, handoff front-enqueue + announce + no-op, sandbox-error detection + retry/no-retry, CLI handoff verb. 93 fast tests pass; the keyless real-Chromium launch test passes (validates the start() refactor). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Disable Chromium's sandbox when running as root (every minds workspace runs the daemon as root inside gVisor or a VM). Fixes the ~30s launch hang -> HTTP 504 on Lima (bare Debian VM, no gVisor). Verified live: browsers launch, POST 200. - GET /browsers reports can_create/create_reason/count/max; the + menu disables 'New browser' with the reason in parens while starting up / at cap, so a click can't race the fleet restore (browser 0 or multi-browser). Disabled menu items no longer fire their action. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These 4 tests launch real Chromium (CI installs it via playwright install --with-deps), but cold-start + screencast/nav/restart exceeds the global --timeout=10, so they timed out before their pytest.skip guard could fire. Per-test 120s override lets them run (or cleanly skip if Chromium can't launch). Fixes the only failing CI checks on PR #201. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Real Chromium hangs on the GH Actions runner -- the CDP connection never completes (120s timeout + NoneType new_cdp_session), even with the binary installed and the sandbox off. It's a runner limitation, not a product bug: the fleet runs fine on real docker/Lima/cloud workspaces (verified). Skip these 4 tests when GITHUB_ACTIONS=true so CI is green; they still run locally and on offload where a real browser comes up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cement check
'New URL' was intentionally removed from the + menu ('New browser' replaces it). The
split-placement E2E now exercises the same openIframeTab+targetGroup path via 'New
terminal', which still exists. Note the removal in the system_interface changelog.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Brief design decision overview:
|
| targetGroup && dockview.groups.some((g) => g.id === targetGroup.id) | ||
| ? { position: { referenceGroup: targetGroup.id } } | ||
| : {}; | ||
| addPanelForRef(`service:browser?session=${id}`, getPrimaryAgentId(), placement); |
There was a problem hiding this comment.
this seems like a novel way to disambiguate having multiple tabs for a single service. i believe the same type of thing already exists for terminals, since you can have multiple terminals. we should probably be consistent - either update terminals to do this same thing or update browsers to do the same as terminals. i'm open to either, though it does seem like we're adding a lot of code here now that might not be needed if we can reuse the existing logic?
There was a problem hiding this comment.
I agree that it would be nice to unify them
however, I'm ok deferring that to a later refactor in this case
my reasoning is that the terminal stuff is done in a wrong / ad hoc way as well anyway (terminals dont actually properly maintain state), so that whole setup needs to be refactored anyway
| if not layout.exists(): | ||
| return False | ||
| result = subprocess.run( | ||
| [sys.executable, str(layout), *args], cwd=str(root), capture_output=True, text=True |
There was a problem hiding this comment.
might be worth making layout.py into a library rather than a script so we could just import it, though i'm not sure what the complexity of that would be
There was a problem hiding this comment.
Gotcha -- Josh do you have an opinion on this? Either way we'd probably want this in a different PR i think
There was a problem hiding this comment.
probably not worth the effort now since this is a pretty simple usage
There was a problem hiding this comment.
please do that as a follow-up refactor!
this PR is already large enough, and it's best not to mix refactors + new changes (gets hard to review)
…tem-explaining Reframe as a clean first revision -- drop the "two agents only in name / no separate brain" framing that walked back the old delegation design, and trim system internals the executing agent doesn't need (e.g. the MNGR_AGENT_ID explainer). No operational detail removed: every verb/flag, the requery-before- acting discipline, the ownership rules, the CAPTCHA/2FA/robot-check handoff, and the exit-code table are all preserved. 411 -> 266 lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
besides my comments, lgtm! (which is surprising to me -- nice job on this :) there's a lot going on ) lmk once the above comments are addressed and I'll give it a final quick glance and then we can merge it fingers crossed that it doesn't end up being too annoying to convert away from async... |
…ehind one loop bridge runner.py: FastAPI/uvicorn -> Flask + flask-sock (+ wsgi.py threaded server), matching the system interface; drops the duplicate FastAPI/uvicorn deps. The async engine (ownership state machine, browser_use, the Playwright observer) stays async on ONE background event loop reached via a single bridge (loop_bridge.py, run_coroutine_threadsafe); only the web layer is sync. browser_use is kept (no DOM/driving reimplementation). Concurrency hardening from adversarial review: run_agent registers its cancellable handle under _control_lock with an ownership re-check (closes a take-control race), _sweep_idle_lease snapshots under the lock, _init_status is thread-safe across the loop/Flask boundary. Type-clean (ty); 115 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…startup, cap 3, modal naming Every browser is created on demand and addressed by a random ~2-word english name (names.py), never a count; the fleet starts EMPTY (no default browser 0). All startups serialize through the manager's single lock (at most one Chromium launching at a time -- spam-safe); create works DURING init (queued behind the serialized restore, not 503-blocked); fleet capped at 3, `new` past the cap is rejected. "New browser" now opens a CreateAgentModal-style naming modal (prefilled editable name) + an optimistic "Browser starting..." pane. Renaming is intentionally unsupported (the skill tells agents so). manifest v2 drops next_id. Adversarial-review hardening: capacity reads run on the loop (no cross-thread dict race); the naming modal pre-validates duplicates and a failed create never tears down a pre-existing pane; _set_active_page guards teardown; README updated. libs/browser 140 tests + ty clean; frontend build + 366 vitest green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stale pane: a cast client connecting to an already-live browser was seeded only control/tabs (no frame), and the screencast emits only on repaint, so a client joining a static/blank page got no frame and the viewer (which cleared the banner only on a frame) stayed stuck on "Starting browser..." until a manual reopen. Fix: the viewer now clears the loading banner on the first control/tabs message (proof the browser is live, crash-guarded), and the daemon replays the cached last frame to a newly-registered cast client so the canvas shows the live page immediately. 1013-retry (not-yet-registered) and 1008 (gone) paths unchanged. Crash overlay: showCrashed() now shows a full near-opaque grey cover (#crashoverlay, inset:0) with bold white text, instead of small placeholder text on the black canvas. 142 tests (+2), ty clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…c viewer Track each browser's lifecycle explicitly: registered as `init` immediately on create (so the create returns fast, the modal closes instantly, and the cast WS always finds a real browser -- no 1013 race), flipped to `running` once Chromium is up + the screencast attached, `crashed` on death. Launches still serialize (one Chromium at a time via the startup lock); `init` counts toward cap=3; a command on an `init` browser returns "still starting -- try again". The viewer now renders DETERMINISTICALLY off the broadcast lifecycle -- full "Starting browser" overlay for init, the live page for running, a full crashed overlay -- instead of inferring state from frames/closes (which got stuck until a manual reopen). Modal closes immediately on Start. Race-hardening (adversarial review): start() re-checks _closed/_crashed after its awaits and kills the just-launched Chromium if torn down (no resurrection / no leaked second Chromium); manager.close awaits the in-flight launch task before teardown; close()/launch-failure push a cast-queue shutdown sentinel so the server tears down deterministically. +regression test. 150 tests, ty + ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onnect leak Backend (runner.py / session.py): - Detect a disconnected waiter parked in the acquire FIFO queue: the wait-phase NDJSON stream heartbeats a `ping` each idle poll, so a dropped waiter surfaces in bounded time and its acquire is cancelled (was held for the holder's whole lease, blocking everyone behind it). - Release on acquire-phase disconnect, not just cancel: a grant that lands in the same poll window the client drops is no longer orphaned until the 90s idle sweep (CAS no-op otherwise; mirrors the run/hold finally). - `acquire` is now strictly non-blocking (reserve-or-queue); blocking-wait lives in task/hold, which heartbeat. Closes a worker-thread/queue-slot pin on a dropped non-streaming caller. - Gate human take-control on lifecycle (no-op on init/crashed; only pin once running). - Read acquire/handoff owner snapshots on the loop (consistent with the mutation). - Persist a browser at create (init), not only once running; manifest snapshots the live fleet (init + running), excludes crashed. - One-off frame capture for a fresh viewer of a running-but-unpainted browser. - Lifecycle-aware "initializing" banner (don't tell a running viewer the fleet is initializing); failed-launch names closed terminally (1008) for late viewers. - Dedicated generous timeout for direct-control actions (BROWSER_DIRECT_ACTION_TIMEOUT). Frontend (CreateBrowserModal / DockviewWorkspace / viewer): - Surface create failures: the modal re-opens pre-filled with the typed name and the daemon's reason (400/409/503/network) instead of silently tearing the pane down. - Client-side name validation mirroring the daemon, before any pane opens or POST. - Viewer no longer downgrades an already-running page to the "init" overlay. Conservative simplify pass + adversarial re-review (races/fixes-hold/gate); the two follow-up backend fixes above came out of that re-review. Full gate green: 162 backend pytest + ty + ruff, 376 frontend vitest + build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit added `key: newBrowserPrefillName ?? "new"` to the CreateBrowserModal vnode. That vnode sits in the DockviewWorkspace view's children array among unkeyed sibling vnodes (the New chat / New agent modals, the destroy dialog). Mithril's normalizeChildren throws "vnodes must either all have keys or none" when one child in a fragment is keyed and the rest are not (numKeyed !== 0 && numKeyed !== length). The throw kills the render, and once a redraw throws mid-flight Mithril's redraw machinery is left broken -- so New chat, New agent AND New browser all stopped opening (New terminal survived because it opens a dockview panel directly, bypassing the Mithril redraw). The key was also unnecessary: onAccept sets showNewBrowserModal=false before the background POST, so a failure re-open is a fresh mount and oninit re-reads initialName/initialError on its own. The 376 frontend tests missed this because they mount the modal alone (numKeyed === length === 1, no throw); the bug only exists among the unkeyed siblings in DockviewWorkspace. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
also fixed - browser crash screen is more obvious. |
| yield _ndjson(event) | ||
|
|
||
|
|
||
| async def _acquire_result(acquire_task: Any) -> str: |
There was a problem hiding this comment.
seems like some async and await have stayed in here? You probably want to remove the rest!
| from playwright.async_api import ( | ||
| Browser, | ||
| BrowserContext, | ||
| CDPSession, | ||
| Page, | ||
| Playwright, | ||
| async_playwright, | ||
| ) |
There was a problem hiding this comment.
it doesn't seem like this was switched to sync?
see video posted in slack