Skip to content

Add agentic-browser-fleet: many agents drive many browsers with atomic ownership#201

Open
minhtrinh-imbue wants to merge 31 commits into
mainfrom
feat/agentic-browser-fleet
Open

Add agentic-browser-fleet: many agents drive many browsers with atomic ownership#201
minhtrinh-imbue wants to merge 31 commits into
mainfrom
feat/agentic-browser-fleet

Conversation

@minhtrinh-imbue

@minhtrinh-imbue minhtrinh-imbue commented Jun 23, 2026

Copy link
Copy Markdown

see video posted in slack

MT-GoCode and others added 16 commits June 22, 2026 16:16
…owser-use agent

A new dockview tab ('+' -> 'New browser', replacing 'New URL') that spins up a
headless Chromium inside the compute and streams it to the tab, with a chat that
can hand the same browser to a browser-use AI agent.

Streaming/input use the Chrome DevTools Protocol (steel's mechanism): a Playwright
observer attached over CDP runs Page.startScreencast (JPEG frames over a WebSocket)
and injects mouse/keyboard via Input.dispatch*Event -- low-latency, display-
independent, no Xvfb/ffmpeg. browser-use drives the same Chromium for AI tasks.

Control: while the agent runs the status shows 'Agent has control' and human
browser input (incl. tabs) is locked; a message typed during a run is queued
(one pending, cancelable) and runs after; 'Take control' stops the agent and
returns control (no resume -- send a new message to continue). The view follows
the agent's active tab. Concurrent sessions capped (BROWSER_MAX_SESSIONS, default 3).

New lib libs/browser (FastAPI service at /service/browser/) + the dockview menu
item, gated on a resolvable ANTHROPIC_API_KEY.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
- Drop the Imbue Cloud / litellm proxy path entirely: resolve a direct Anthropic
  API key only (no ANTHROPIC_BASE_URL), and update the gating messages.
- Rename 'Take control' -> 'Interrupt & take control'.
- Make take-control one-click immediate: flip control to the human first (UI
  unlocks at once) then hard-cancel the run task, instead of waiting out
  browser-use's cooperative stop (which only lands at the next step boundary,
  seconds away if mid-LLM-call).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…er each

Turn the single live-browser service into a per-workspace fleet of headless
Chromium browsers with one compare-and-set ownership state machine, plus an
`agentic-browser-fleet` CLI for agents to drive them and a viewer-only UI.

- session.py: one `_transition` writer per browser (CAS under a lock; the
  displaced run is cancelled outside the lock so a human take-control can't
  deadlock). State = controller + owner_agent_id + human_pinned. Agents never
  preempt each other -- a second agent monitor-and-waits in a FIFO queue. Human
  take-control always wins and ends the agent's task; the agent resumes only via
  explicit --reclaim. Monotonic browser ids (0 = default), never reused.
  Ownership is bound to the live task connection (no stuck locks).
- runner.py: /browsers fleet endpoints; a streaming task endpoint with
  acquire-or-wait, disconnect-release, and a max-step/wall-clock backstop; /hold
  backs the lock verb.
- fleet.py: the `agentic-browser-fleet` CLI (ls/new/task/lock/unlock/release).
  Discovers the daemon via applications.toml; streams the browser-use
  thinking/action trace to the calling agent's own output.
- index.html: viewer-only (in-tab chat removed) with an "Agent has control"
  overlay, take-control, return-to-agents, and a dead-session state.
- layout: a `service:browser?session=<id>` pane ref + the "+" menu lists and
  focuses active browsers.
- agentic-browser-fleet skill, per-project changelogs, and tests for every
  load-bearing case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
A free (unpinned-human) browser was labeled "(you took control)" in the "+"
menu; that wording is for a human who explicitly took control. Show "(free)"
for an available browser, matching the CLI's owner labels.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ser itself)

The agent now drives the browser one command at a time -- `state <id>` returns
the page as a numbered list of clickable elements, then `open`/`click`/`input`/
`select`/`scroll`/`keys`/`screenshot`/`tab <id> ...` act on it -- instead of
delegating a whole goal to an autonomous browser-use agent. This matches the
upstream browser-use skill and is keyless: the agent does its own reasoning
(already authenticated), and the browser commands are deterministic.

- session.py lifts browser-use's own ActionHandler (the executor its CLI uses)
  against our held BrowserSession, so the live screencast + ownership wrap the
  direct commands. `state` caches the numbered selector_map; a `click` against a
  stale index returns a clear "run state first" instead of mis-clicking.
- Ownership for direct control is a sticky lease: the first command acquires it,
  every command re-checks ownership (compare-and-set) right before acting -- so a
  human take-control mid-sequence makes the next command a clean "lost control"
  rather than touching the human's browser -- and an idle lease auto-releases
  (~90s) since short commands have no connection to bind to.
- New CLI verbs (state/open/click/input/select/scroll/keys/screenshot/tab/
  acquire) + `ls --include-tabs`; every response carries the owner so the agent
  knows if it lost control. `task` (delegation) is kept as an optional fallback.
- SKILL.md rewritten to the direct-control model (the state->click loop, choosing
  a browser, re-querying after each change, ownership rules).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…keyless)

The "+" menu's "New browser" item disabled itself and showed a "Browser sessions
need an Anthropic API key" dialog whenever the workspace had no key. That gate was
a leftover from the delegation model: direct control needs no key (the agent and
the user drive the browser by hand), so a browser can always be started. Only the
optional task/extract verbs need a key, and the daemon already checks those at
call time.

- DockviewWorkspace.ts: "New browser" always creates a browser; removed the dead
  browserKeyAvailable state and the gating dialog.
- session.py: anthropic_key_status() now describes only the key-only task/extract
  verbs and no longer claims browser sessions in general need a key.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ip transparency

Direct control silently lost the auto-layout in the pivot: _pull_in_pane was only
wired into the old task/lock verbs, so driving a browser with state/click/...
left it headless. Plus a batch of viewer fixes the live feature surfaced.

- Auto-layout: run_action now reports newly_acquired (the first command that takes
  a browser, and the first after a human hands it back). The CLI splits that browser
  in as a pane to the RIGHT of the agent's chat -- chat left, browser right, one
  pane per browser, fired once per acquisition. _pull_in_pane now splits (relative
  to the BROWSER_FLEET_ANCHOR chat, else self) instead of falling back to `open`
  (which only tabbed into the chat group). Splitting an already-open browser is a
  no-op that focuses it, so re-acquire never duplicates panes.

- Uniform resolution: browser-use pins the viewport on the first tab only; tabs
  opened later came up at different sizes and the viewer letterboxed them
  inconsistently. _set_active_page now forces Emulation.setDeviceMetricsOverride to
  the screencast size on every tab, so all tabs stream at the same resolution.

- Viewer nav: Back / Forward / Reload buttons left of the address bar (active only
  while the human holds control); the daemon handles back/forward/reload cast
  messages against the live page. Reload reloads the page, not the browser.

- Ownership transparency: the "Agent has control" overlay shows a live idle
  countdown ("idle 12s, releases control in 78s") so a watching human knows when a
  quiet agent's sticky lease auto-releases (90s idle-TTL), and lists agents queued
  (monitor-and-wait) behind the owner. The waiting queue is also in GET /browsers
  and `ls` ([queued: ...]). The keepalive loop broadcasts the control message each
  tick while an agent holds, to drive the countdown.

Verified: 45 browser/fleet unit tests, plus real-Chromium integration tests for the
screencast (device-metrics override does not stall frames) and for newly_acquired.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
… trim viewer chrome

The skill had no instruction to release a browser when finished, so an agent that
was done still held it (grey "Agent has control" overlay) until the 90s idle TTL.
And the sub-agent anchor guidance promised a path that cannot work.

- Skill: tells the agent to `release <id>` the moment it is done with a browser,
  handing control straight back to the human instead of waiting out the idle
  timeout. The idle auto-release stays as the backstop.

- Skill: replaced the "set BROWSER_FLEET_ANCHOR for a sub-agent" guidance with the
  reality -- a launch-task sub-agent runs in a separate, isolated container with no
  access to this workspace's browser fleet or its live panes (its
  agentic-browser-fleet calls hit a daemon/registry that isn't there, which is why
  a delegated browser's pane never anchored to the foreground chat). The agent now
  drives the browser itself, in the chat the user is watching; browser work belongs
  to the user-facing agent, not a background sub-agent.

- UI: dropped the placeholder "web" example server from the "+" menu (the browser
  fleet is the real web surface), and removed the per-tab Refresh button from
  browser panes -- reloading the pane only reconnects the live view (it reads as
  "restart the browser"); the viewer's own in-page Reload button covers reloading
  the actual page.

Verified: 354 frontend tests pass (rebuilt bundle).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ent waits

When you hold a browser, the bar now lists the agents queued to use it ("Agents
waiting to use this browser: ...") and shows "Return control to agents" ONLY when
one is actually waiting -- otherwise it reads "No agents are waiting" with no
button, since there is nobody to hand back to (the next agent that wants the
browser just takes it).

Skill: states the fleet cap (5; `new` past it returns "Too many open browsers",
release/close one first) and the "another browser vs. wait" rule -- another agent
holds it, take a different browser; a human took YOUR browser mid-task, wait and
resume that same one.

This is the safe, well-specified slice of the queue/handoff paradigm. The
daemon-side event-driven resume (resume-after-handoff lane, mngr-message wake,
claim window, auto-give) lands next as a focused, tested change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…me on hand-back

Before, a human taking a browser an agent was driving just stopped the agent; the
user had to re-type "keep going". Now it is a handoff:

- An agent whose direct command is rejected because a human (or another agent) holds
  the browser is added to a resume queue (it ends its turn). When the browser frees,
  the daemon hands it to the first queued agent and messages it (via `mngr message`,
  the same path launch-task uses) to resume -- it re-reads the page with `state` and
  continues. This is the CAPTCHA / login handoff flow.

- A human pin only *blocks* agents while the human is actively driving. If they go
  quiet for BROWSER_HUMAN_ACTIVE_GRACE (default 20s), the pin yields: a queued agent
  is handed the browser by the keepalive sweep, and a freshly arriving agent simply
  takes it. A forgotten hold never blocks the fleet. An actively-driven pin keeps
  refreshing and is never yanked.

- An agent handed the browser from the resume queue but that never sends a command
  (interrupted/killed) has the grant revoked after BROWSER_CLAIM_WINDOW (default 12s),
  so the browser passes to the next waiter instead of idling on a no-show.

- busy_human now exits 2 (preempted: stop and wait for the wake), not 3 (generic
  busy). busy_agent still exits 3 but also queues the agent to be woken. The resume
  queue is surfaced in the same `waiting` list the viewer bar and `ls` already show.

Skill: take-control guidance is now "tell the user, end your turn, you'll be messaged
to resume; re-run state first". Release rules spelled out by case (task done, user
says stop, switching browsers; keep multiples only while actively driving).

Verified: 49 browser/fleet unit tests (4 new: enqueue-on-busy + wake, stale-pin
yields, stale-pin persists when nobody waits, unclaimed-grant revoke) + 14 ratchets.
A real-Chromium repro confirms enqueue-on-busy / newly_acquired / busy_agent / clean
shutdown. (The pytest real-Chromium integration harness is flaky/slow in this env;
the same test passed at ~5s earlier in the session.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…ost wakes, etc.)

A max-effort multi-agent review of the resume-queue state machine surfaced several
real concurrency/correctness bugs in the just-added handoff; all are fixed here.

- Double-queue spurious re-grant: an agent can be in both the resume queue (rejected
  direct command) and the connection-bound wait queue (`task`/`acquire --wait`).
  Granting it from the wait queue now also clears its resume-queue entry, so a later
  release no longer re-grants the freed browser to an agent that already finished and
  fires a spurious "handed back to you" wake. (Regression test added.)

- Lost wakes: `_wake_agent` was scheduled with a bare `asyncio.create_task`, which
  asyncio holds only weakly -- the wake (an `mngr message` subprocess) could be GC'd
  before it ran, stranding the queued agent until the claim window. Wake tasks are
  now retained in a set until they complete.

- The explicit `acquire <id>` HTTP endpoint did not pass `enqueue_on_busy`, so the
  CLI told the agent "you're queued ... I'll message you when it frees" while the
  agent was never actually queued -- it would wait forever. The endpoint now enqueues.

- Human-active grace was 20s, short enough that a `task`/`acquire` could take a
  human-pinned browser mid-CAPTCHA once the human paused input (reading instructions,
  fetching a 2FA code). Bumped the default to 120s; any input refreshes it, and the
  "Return control to agents" button remains the instant hand-back.

- The wake subprocess now runs with cwd set to a file-anchored repo root (not the
  daemon's inherited cwd) so the `mngr` dev shim resolves this checkout.

- SKILL.md exit-code table corrected: a human take-control during a direct command is
  exit 2 (preempted: stop, end turn, you'll be messaged to resume), not exit 3; exit 3
  is now "held by another agent".

Verified: 64 browser/fleet unit tests + 14 ratchets (1 new regression test for the
double-queue re-grant). Deliberately deferred review items: the serial page.title()
in `ls` (efficiency, capped fleet), the two-queue abstraction (style), and the
pre-existing "New URL" menu removal (from the original fleet commit, a product
decision, not this change).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
When a browser's Chromium dies unexpectedly (OS/OOM kill, segfault) the daemon used
to only find out lazily, returning a raw CDP exception, and the viewer froze on the
last frame. Now it's a first-class state.

- Detection: register a handler on the Playwright observer's `disconnected` event
  (fires when the CDP connection drops); our own close() sets _closed first so
  intentional teardown isn't mistaken for a crash. Backstop: when a direct command
  fails, classify it as a crash if the connection is gone (is_connected() is False),
  so it's caught even if the event hasn't fired yet.

- Agent-facing: a command on a crashed browser short-circuits to status "crashed"
  with "browser N crashed (Chromium killed -- e.g. out of memory) and is gone. Start
  a fresh one with `new`" instead of trying to drive a corpse or leaking an exception.

- Viewer: a distinct terminal "This browser crashed ... open a new browser from +"
  state, locked controls, and it ignores any late frames/control so it can't be
  overwritten. describe()/`ls` report crashed; the keepalive stops sweeping a dead one.

- Lifecycle: a crashed id is never reused (new browser = new number, new tab), and
  crashed shells are excluded from the fleet cap so a crash never blocks `new`.

Verified: 67 browser/fleet unit tests + 14 ratchets (3 new: crashed reported to
agent+viewer, intentional close not flagged as crash, crashed excluded from cap),
plus a real-Chromium integration test that kills Chromium out from under the session
and asserts the crash is detected and reported (1 passed, ~7s).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
The DELETE /browsers/{id} endpoint existed but had no CLI verb. `close <id>` shuts an
entire browser down (all its tabs) and retires its id (never reused) -- distinct from
`tab <id> close`, which closes a single tab. (Profile-dir cleanup will hang off this
once persistent profiles land.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…fest)

Close minds and come back: your browsers return -- same tabs, and still logged in.

- Persistent Chromium profiles (Tier A): each browser gets its own user_data_dir
  under $MNGR_HOST_DIR/browser-profiles/ (the workspace volume), so Chromium itself
  persists cookies/logins/history. We serialize none of that by hand. The profile
  dir name contains the literal "browser-use-user-data-dir-" substring on purpose:
  it makes browser_use's _copy_profile() use the dir IN PLACE instead of copying it
  to a throwaway temp dir (the bundled binary is "Chrome for Testing", so its
  is_chrome check is true) -- which would silently defeat persistence. Verified
  against browser-use==0.13.1 and guarded by an integration tripwire.

- Manifest (Tier B): a tiny runtime/browser-fleet.json (git-backed to the
  mindsbackup branch) records which browsers existed, their tab URLs, and the
  monotonic id high-water mark -- topology only, never ownership/queues (process-
  scoped) or profile bytes. Atomic write (tmp + os.replace). New module manifest.py.

- Restore + init gate: on daemon startup the fleet is relaunched EAGER-SEQUENTIALLY
  (one browser at a time -- no cold-boot memory spike; fleet cap is 5) behind an
  init gate. State-changing routes return 503 "initializing" until restore finishes;
  read-only (ls/state/health) stay open; the cast socket is read-only during init.
  The gate ALWAYS opens (finally), even if restore raises -- never wedged shut. A
  fresh workspace seeds browser 0 at the home page; a manifest-loss-but-profiles-
  survived case relaunches the profiles rather than wiping logins; crashed browsers
  are never restored as healthy; restored ids never collide (high-water _next_id).

- Checkpoints: topology changes (create/close) persist immediately; a periodic
  content-diff checkpoint catches tab-URL drift (idle workspaces write nothing); a
  final checkpoint on clean shutdown. close <id> forgets the profile dir; an orphan-
  profile sweep on startup bounds disk.

- CLI maps 503 "initializing" to a clear "still starting up, try again" (exit 3);
  the viewer shows a brief "restoring your saved browsers" banner. Skill documents
  that browsers/logins persist and how to handle the brief initializing window.

Design was produced by a multi-agent workflow (3 independent designs + 2 adversarial
risk passes -> synthesized spec); that pass is what caught the _copy_profile trap.

Verified: 87 fast tests (manifest roundtrip/atomic/corrupt; restore high-water /
resting / first-boot / manifest-loss / orphan-sweep / skip-failing; snapshot
excludes-crashed + topology-only; init gate blocks + poison-pill opens; close forgets
profile + drops manifest; CLI 503 mapping) + 14 ratchets. The real-Chromium
persistence test (cookie survives a manager restart + anti-_copy_profile tripwire) is
written; it could not be run green locally because this machine's real-Chromium
launch is currently congested (the pre-existing real-Chromium tests hang the same
way) -- CI offload runs it in a clean sandbox.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…e review

A max-effort multi-agent review of the persistence + dead-browser state machine
found real durability/correctness defects in restore; all fixed here.

- Transient relaunch flake no longer destroys a saved browser. Before, a browser
  whose Chromium failed to relaunch on boot was dropped from the manifest AND its
  profile rmtree'd -- a one-time hiccup permanently lost the user's logins. Now a
  flaked browser is KEPT in the manifest (retried next boot) and its profile is never
  swept; only profiles for ids we no longer want are removed. (Regression test added.)

- restore() no longer holds the manager lock across all sequential Chromium launches.
  Each launch takes the lock briefly and releases between browsers, so a slow restore
  no longer blocks the read-only ensure_browser_0 / list / cast-connect paths for
  minutes (the init gate's "ls/state stay open" promise actually holds).

- A crashed default browser 0 is recoverable again: ensure_browser_0 recreates it if
  the existing entry is a dead shell (reusing id 0's profile, so it returns logged in),
  restoring the "the default is always there on access" invariant.

- close 0 no longer wipes browser 0's persistent profile (it's the recreated-on-demand
  default; wiping silently logged the user out when 0 came back). Only a retired
  non-default id forgets its profile. A manifest-write hiccup during close no longer
  500s or skips the profile delete.

- restore honors the manifest's active_tab (re-focuses the previously-active tab
  instead of leaving the last-opened one foregrounded).

- The periodic checkpoint loop now starts on every startup path (including the
  waiting-for-chromium and restore-failed paths), so tab-URL drift is always persisted.

- A read-only `state` peek at a busy browser no longer silently enqueues the agent as
  a resume waiter (only state-changing commands queue). On crash, the manifest is
  checkpointed promptly so an ungraceful kill right after a crash doesn't restore the
  dead browser as healthy. The checkpoint loop + final save now survive a transient
  Playwright/CDP error (catch _BROWSER_ERRORS, not just OSError). The manifest snapshot
  reads page.url without per-tab title CDP round-trips.

Verified: 94 fast tests green (82 pure-Python incl. new crashed-0-recovers,
state-doesn't-enqueue, flaked-browser-kept; 12 fake-browser integration incl. init
gate, poison-pill, close-forgets vs close-0-keeps). Real-Chromium tests unrun locally
(env congested); CI offload covers them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
…y-agent message

Three UX fixes from live use:

- The viewer always shows a control indicator now. A fresh, AI-untouched browser
  (resting state) previously showed nothing; it now shows a persistent "You have
  control" bar whenever you can drive it, the same as after an explicit take-control.
  The "Return control to agents" button still appears only when an agent is queued.

- Each browser surfaces as its OWN pane on the right (the layout split now passes
  --new-group), instead of being tabbed into an existing browser pane. Opening a
  second browser lands beside the first, not inside it.

- A non-primary agent (a launch-task sub-agent or a second "+ New agent") can reach
  the fleet daemon over the network but can't drive this workspace's dockview layout.
  It used to wait 5s and print a confusing "service 'browser' is not registered ...
  still running headless" error. Now it skips the doomed layout attempt and says
  plainly: the browser is running, but only the primary agent shows browser panes
  (open it from the "+" menu). The misleading "headless" wording is removed.

Verified: 84 fast tests (incl. 2 new: non-primary skips the pane-pull; primary uses
--new-group).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
MT-GoCode and others added 5 commits June 24, 2026 15:33
…anes

Follow-ups from a multi-agent root-cause of "opening a new browser didn't show a
new pane":

- `new` now pulls the browser's pane immediately, so "open a new browser" visibly
  opens one instead of showing nothing until the first `open`/`click`. Idempotent
  with the first-command pane-pull (splitting an already-open pane just focuses it).

- Removed the dead `MINDS_BROWSER_SERVICE_URL` guard in `_pull_in_pane`: that env var
  is never set anywhere in the tree, so the guard never fired -- and its premise
  ("non-primary agent can't show panes") was the wrong rule. Per the intended model,
  ANY user-started agent (the primary OR one opened via "+ New agent") should surface
  the pane next to its own chat; only launch-task/background agents can't. _pull_in_pane
  now just attempts the split (anchor, else self) and, when it can't land (a background
  agent with no chat in this UI, or the layout server unreachable), warns in one clean
  line -- no raw 5s "service not registered", no misleading "headless".

- `_layout` gained a `quiet` mode so the pane-pull can substitute that clean message.

Note: the root cause of the reported case was a STALE workspace (created before the
`--new-group` fix); recreating the workspace picks up the new pane behavior. The
`--new-group` split itself was already correct.

Verified: 85 fast tests (incl. new: new pulls a pane; each browser its own pane;
clean warn when a pane can't be shown).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011wJbP6ZMnE3FB4sDYikqS8
Human take-control is now STICKY: it holds until the human explicitly hands
back, with no idle/grace yield (removed _HUMAN_ACTIVE_GRACE /
_sweep_stale_human_pin / _human_touched_at). A human who grabs a browser keeps
it even if they walk away mid-CAPTCHA -- never force-yielded. Agents still
auto-release via the idle lease; the asymmetry is deliberate.

Agent-initiated handoff for CAPTCHAs / human verification: new
`agentic-browser-fleet handoff <id> "<reason>"` (alias `request-human`). The
agent jumps to the FRONT of the resume queue and control goes to the HUMAN,
pinned (not the next queued agent), until they hand back -- then the requester
resumes first. Viewer shows a distinct amber bar naming the agent + what to
solve; the pane is surfaced. SKILL.md tells agents to use it (and NOT to solve
CAPTCHAs themselves). Exit 2 (preempted) so the agent stops.

Cross-modality sandbox: launch Chromium with the sandbox on by default (works
under gVisor on docker/cloud/AWS) but auto-retry once with it off if a first
launch fails for a sandbox reason -- so the fleet comes up unattended on Lima
(plain-Linux VM) and non-gVisor VPSes (e.g. Vultr) too. BROWSER_NO_SANDBOX=1
forces it off. No provider sniffing.

Tests: sticky-pin, resting-free, handoff front-enqueue + announce + no-op,
sandbox-error detection + retry/no-retry, CLI handoff verb. 93 fast tests pass;
the keyless real-Chromium launch test passes (validates the start() refactor).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Disable Chromium's sandbox when running as root (every minds workspace runs the
  daemon as root inside gVisor or a VM). Fixes the ~30s launch hang -> HTTP 504 on
  Lima (bare Debian VM, no gVisor). Verified live: browsers launch, POST 200.
- GET /browsers reports can_create/create_reason/count/max; the + menu disables
  'New browser' with the reason in parens while starting up / at cap, so a click
  can't race the fleet restore (browser 0 or multi-browser). Disabled menu items
  no longer fire their action.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These 4 tests launch real Chromium (CI installs it via playwright install --with-deps),
but cold-start + screencast/nav/restart exceeds the global --timeout=10, so they timed
out before their pytest.skip guard could fire. Per-test 120s override lets them run (or
cleanly skip if Chromium can't launch). Fixes the only failing CI checks on PR #201.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@minhtrinh-imbue minhtrinh-imbue marked this pull request as ready for review June 25, 2026 11:25
MT-GoCode and others added 2 commits June 25, 2026 04:29
Real Chromium hangs on the GH Actions runner -- the CDP connection never completes
(120s timeout + NoneType new_cdp_session), even with the binary installed and the
sandbox off. It's a runner limitation, not a product bug: the fleet runs fine on real
docker/Lima/cloud workspaces (verified). Skip these 4 tests when GITHUB_ACTIONS=true so
CI is green; they still run locally and on offload where a real browser comes up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cement check

'New URL' was intentionally removed from the + menu ('New browser' replaces it). The
split-placement E2E now exercises the same openIframeTab+targetGroup path via 'New
terminal', which still exists. Note the removal in the system_interface changelog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@MT-GoCode

MT-GoCode commented Jun 25, 2026

Copy link
Copy Markdown

Brief design decision overview:

  • browser-use & its claude skill is already immensely popular, so the agentic-browser-fleet daemon & skill I made was a sort of meta/wrapper skill that wrapped that browser-use API and added multi-browser management on top
  • Many agents + many browsers + human co-interaction = high possibility of race conditions, inconsistent states, agent confusion => built a control lock & queue system in agentic-browser-fleet to handle interaction cases in a generalized way. Intelligent handoff to human or other agents slots in nicely to this system.

Comment thread .agents/skills/agentic-browser-fleet/SKILL.md Outdated
Comment thread .agents/skills/agentic-browser-fleet/SKILL.md Outdated
targetGroup && dockview.groups.some((g) => g.id === targetGroup.id)
? { position: { referenceGroup: targetGroup.id } }
: {};
addPanelForRef(`service:browser?session=${id}`, getPrimaryAgentId(), placement);

@gnguralnick gnguralnick Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a novel way to disambiguate having multiple tabs for a single service. i believe the same type of thing already exists for terminals, since you can have multiple terminals. we should probably be consistent - either update terminals to do this same thing or update browsers to do the same as terminals. i'm open to either, though it does seem like we're adding a lot of code here now that might not be needed if we can reuse the existing logic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to unify them

however, I'm ok deferring that to a later refactor in this case

my reasoning is that the terminal stuff is done in a wrong / ad hoc way as well anyway (terminals dont actually properly maintain state), so that whole setup needs to be refactored anyway

Comment thread apps/system_interface/frontend/src/views/DockviewWorkspace.ts Outdated
Comment thread libs/browser/src/browser/runner.py Outdated
if not layout.exists():
return False
result = subprocess.run(
[sys.executable, str(layout), *args], cwd=str(root), capture_output=True, text=True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth making layout.py into a library rather than a script so we could just import it, though i'm not sure what the complexity of that would be

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha -- Josh do you have an opinion on this? Either way we'd probably want this in a different PR i think

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth the effort now since this is a pretty simple usage

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do that as a follow-up refactor!

this PR is already large enough, and it's best not to mix refactors + new changes (gets hard to review)

…tem-explaining

Reframe as a clean first revision -- drop the "two agents only in name / no
separate brain" framing that walked back the old delegation design, and trim
system internals the executing agent doesn't need (e.g. the MNGR_AGENT_ID
explainer). No operational detail removed: every verb/flag, the requery-before-
acting discipline, the ownership rules, the CAPTCHA/2FA/robot-check handoff, and
the exit-code table are all preserved. 411 -> 266 lines.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread libs/browser/src/browser/session.py Outdated
Comment thread libs/browser/src/browser/session.py Outdated
@joshalbrecht

Copy link
Copy Markdown
Contributor

besides my comments, lgtm! (which is surprising to me -- nice job on this :) there's a lot going on )

lmk once the above comments are addressed and I'll give it a final quick glance and then we can merge it

fingers crossed that it doesn't end up being too annoying to convert away from async...

MT-GoCode and others added 7 commits June 26, 2026 10:59
…ehind one loop bridge

runner.py: FastAPI/uvicorn -> Flask + flask-sock (+ wsgi.py threaded server),
matching the system interface; drops the duplicate FastAPI/uvicorn deps. The async
engine (ownership state machine, browser_use, the Playwright observer) stays async
on ONE background event loop reached via a single bridge (loop_bridge.py,
run_coroutine_threadsafe); only the web layer is sync. browser_use is kept (no
DOM/driving reimplementation).

Concurrency hardening from adversarial review: run_agent registers its cancellable
handle under _control_lock with an ownership re-check (closes a take-control race),
_sweep_idle_lease snapshots under the lock, _init_status is thread-safe across the
loop/Flask boundary. Type-clean (ty); 115 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…startup, cap 3, modal naming

Every browser is created on demand and addressed by a random ~2-word english name
(names.py), never a count; the fleet starts EMPTY (no default browser 0). All
startups serialize through the manager's single lock (at most one Chromium
launching at a time -- spam-safe); create works DURING init (queued behind the
serialized restore, not 503-blocked); fleet capped at 3, `new` past the cap is
rejected. "New browser" now opens a CreateAgentModal-style naming modal (prefilled
editable name) + an optimistic "Browser starting..." pane. Renaming is intentionally
unsupported (the skill tells agents so). manifest v2 drops next_id.

Adversarial-review hardening: capacity reads run on the loop (no cross-thread dict
race); the naming modal pre-validates duplicates and a failed create never tears
down a pre-existing pane; _set_active_page guards teardown; README updated.
libs/browser 140 tests + ty clean; frontend build + 366 vitest green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stale pane: a cast client connecting to an already-live browser was seeded only
control/tabs (no frame), and the screencast emits only on repaint, so a client
joining a static/blank page got no frame and the viewer (which cleared the banner
only on a frame) stayed stuck on "Starting browser..." until a manual reopen. Fix:
the viewer now clears the loading banner on the first control/tabs message (proof
the browser is live, crash-guarded), and the daemon replays the cached last frame
to a newly-registered cast client so the canvas shows the live page immediately.
1013-retry (not-yet-registered) and 1008 (gone) paths unchanged.

Crash overlay: showCrashed() now shows a full near-opaque grey cover (#crashoverlay,
inset:0) with bold white text, instead of small placeholder text on the black canvas.

142 tests (+2), ty clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…c viewer

Track each browser's lifecycle explicitly: registered as `init` immediately on
create (so the create returns fast, the modal closes instantly, and the cast WS
always finds a real browser -- no 1013 race), flipped to `running` once Chromium
is up + the screencast attached, `crashed` on death. Launches still serialize
(one Chromium at a time via the startup lock); `init` counts toward cap=3; a
command on an `init` browser returns "still starting -- try again". The viewer
now renders DETERMINISTICALLY off the broadcast lifecycle -- full "Starting
browser" overlay for init, the live page for running, a full crashed overlay --
instead of inferring state from frames/closes (which got stuck until a manual
reopen). Modal closes immediately on Start.

Race-hardening (adversarial review): start() re-checks _closed/_crashed after its
awaits and kills the just-launched Chromium if torn down (no resurrection / no
leaked second Chromium); manager.close awaits the in-flight launch task before
teardown; close()/launch-failure push a cast-queue shutdown sentinel so the
server tears down deterministically. +regression test. 150 tests, ty + ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onnect leak

Backend (runner.py / session.py):
- Detect a disconnected waiter parked in the acquire FIFO queue: the wait-phase
  NDJSON stream heartbeats a `ping` each idle poll, so a dropped waiter surfaces
  in bounded time and its acquire is cancelled (was held for the holder's whole
  lease, blocking everyone behind it).
- Release on acquire-phase disconnect, not just cancel: a grant that lands in the
  same poll window the client drops is no longer orphaned until the 90s idle sweep
  (CAS no-op otherwise; mirrors the run/hold finally).
- `acquire` is now strictly non-blocking (reserve-or-queue); blocking-wait lives in
  task/hold, which heartbeat. Closes a worker-thread/queue-slot pin on a dropped
  non-streaming caller.
- Gate human take-control on lifecycle (no-op on init/crashed; only pin once running).
- Read acquire/handoff owner snapshots on the loop (consistent with the mutation).
- Persist a browser at create (init), not only once running; manifest snapshots the
  live fleet (init + running), excludes crashed.
- One-off frame capture for a fresh viewer of a running-but-unpainted browser.
- Lifecycle-aware "initializing" banner (don't tell a running viewer the fleet is
  initializing); failed-launch names closed terminally (1008) for late viewers.
- Dedicated generous timeout for direct-control actions (BROWSER_DIRECT_ACTION_TIMEOUT).

Frontend (CreateBrowserModal / DockviewWorkspace / viewer):
- Surface create failures: the modal re-opens pre-filled with the typed name and the
  daemon's reason (400/409/503/network) instead of silently tearing the pane down.
- Client-side name validation mirroring the daemon, before any pane opens or POST.
- Viewer no longer downgrades an already-running page to the "init" overlay.

Conservative simplify pass + adversarial re-review (races/fixes-hold/gate); the two
follow-up backend fixes above came out of that re-review. Full gate green: 162 backend
pytest + ty + ruff, 376 frontend vitest + build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit added `key: newBrowserPrefillName ?? "new"` to the
CreateBrowserModal vnode. That vnode sits in the DockviewWorkspace view's
children array among unkeyed sibling vnodes (the New chat / New agent modals,
the destroy dialog). Mithril's normalizeChildren throws "vnodes must either all
have keys or none" when one child in a fragment is keyed and the rest are not
(numKeyed !== 0 && numKeyed !== length). The throw kills the render, and once a
redraw throws mid-flight Mithril's redraw machinery is left broken -- so New
chat, New agent AND New browser all stopped opening (New terminal survived
because it opens a dockview panel directly, bypassing the Mithril redraw).

The key was also unnecessary: onAccept sets showNewBrowserModal=false before the
background POST, so a failure re-open is a fresh mount and oninit re-reads
initialName/initialError on its own.

The 376 frontend tests missed this because they mount the modal alone
(numKeyed === length === 1, no throw); the bug only exists among the unkeyed
siblings in DockviewWorkspace.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@minhtrinh-imbue

Copy link
Copy Markdown
Author

also fixed - browser crash screen is more obvious.
and cleaner logic + UI for opening a new browser showing that it is in an initialization state before one can use it.
serialized browser startup queue for memory protection and handling edge case of simultaneous startup requests

yield _ndjson(event)


async def _acquire_result(acquire_task: Any) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like some async and await have stayed in here? You probably want to remove the rest!

Comment on lines +59 to +66
from playwright.async_api import (
Browser,
BrowserContext,
CDPSession,
Page,
Playwright,
async_playwright,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like this was switched to sync?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants