This document scopes the hosted Android-device model for mobilebridge
when it is used behind a larger control plane such as VulpineOS or the
paid Vulpine API.
It is a planning and integration spec, not a promise that every described control-plane feature already exists inside this repo.
- host a pool of real Android Chrome devices behind a stable service
- allocate short-lived CDP sessions safely to workers or API jobs
- expose health and capacity state without leaking device internals to downstream clients
- make failure handling explicit enough for reliable hosted operation
- multi-tenant concurrent use of one attached browser target
- cross-worker reuse of a live local loopback endpoint
- iOS orchestration in this public repo
- billing, auth, or tenant storage logic inside
mobilebridge
mobilebridge already provides the low-level primitive the hosted model
needs:
session, err := mobilebridge.StartAttachedServer(ctx, serial, "127.0.0.1:9222")That call creates a local attached session for one Android device and returns:
- a public
Endpoint - a
Done()channel for permanent upstream loss - a
Close()path that tears down the server and ADB forward cleanly
The hosted design should treat that attached server as an ephemeral worker-local lease, not as a globally shared network service.
The recommended control plane has three record types.
Represents a physical Android phone or emulator.
Suggested fields:
device_idserialstate:discovered,ready,reserved,attached,draining,offlinemodelandroid_versionsdk_levellast_seen_atlast_healthy_atcapabilities: browser socket, webview socket, screen recordingworker_idlabels: region, rack, usb-hub, reliability tier
Represents one allocated client session against one device.
Suggested fields:
session_iddevice_idtenant_idworker_idstatus:allocating,attached,releasing,expired,failedendpointcreated_atexpires_atreleased_atfailure_reason
Represents the host process that can see a set of USB-attached devices.
Suggested fields:
worker_idhostnameadvertise_addrdevice_countactive_sessionsqueue_depthmax_sessionsfailure_ratelast_errorhealthylast_heartbeat_at
Use a lease-based allocator.
Recommended rules:
- one active lease per device by default
- device must be
readybefore allocation - allocator reserves the device before starting
StartAttachedServer - lease is owned by the worker that created the local loopback endpoint
- downstream callers receive a worker-routable endpoint or a worker-owned action surface, never a raw ADB concept
Selection order:
- required capability filters
- explicit device id, if requested
- sticky reuse for the same tenant or workflow when safe
- lowest recent failure rate
- oldest idle device
The normal lifecycle is:
discovereddevice appears from ADB- health probe promotes it to
ready - allocator marks it
reserved - worker starts
StartAttachedServer - successful attach promotes device to
attachedand lease toattached - client uses the returned CDP endpoint
- release path calls
Close() - janitor clears the lease and returns device to
ready
If attach fails:
- lease becomes
failed - device returns to
readyorofflinedepending on health result - the failure is recorded with the attempted socket type and error
If the proxy Done() channel closes:
- lease becomes
failed - device becomes
offlinepending fresh health probes - allocator should not hand the device out again until a probe passes
Hosted mobile work fails when the control plane cannot distinguish "attached but flaky" from "gone". Use a three-layer health model.
Cheap checks:
adb devices -lreports the device asdevice- the target serial still exists
Medium checks:
- devtools socket classification succeeds
chrome_devtools_remoteor a valid webview socket is present/json/versionresponds afterStartAttachedServer
Expensive checks, run sparingly:
/json/listreturns at least one target- optional
CreateTargetprobe in a canary workflow
The allocator should use cheap and medium checks in the steady state and reserve expensive checks for bootstrapping, canaries, or devices with a recent failure streak.
Use these limits unless real measurements justify widening them.
- one attached lease per device
- one worker-local HTTP server per lease
- short session TTLs with explicit renewals at the control-plane layer
- bounded reconnect attempts before marking the lease failed
- no assumption that a local
127.0.0.1endpoint is portable across workers
For a first hosted rollout, the simplest safe model is:
- one job or operator session per device
- no oversubscription
- release immediately after the job completes
Expected failure classes:
- ADB disappears
- Chrome devtools socket is gone
- device is connected but unauthorized
- attached server starts but
/json/versionnever becomes healthy - proxy reconnect exhausts retries
Recommended control-plane behavior:
- mark the lease failed
- mark the device unavailable until the next successful health probe
- increment a device failure counter
- move repeated offenders to
draining - keep release idempotent
The current API work already matches the intended shape:
- inventory endpoint for device listing
- attach endpoint for session allocation
- release endpoint for explicit teardown
- target creation and recording actions on active Android sessions
The missing hosted pieces are control-plane concerns around those primitives:
- allocator policy
- worker heartbeats
- lease TTL and renewal
- janitor cleanup for stale leases
- readiness scoring and draining
The public repo should document those behaviors here, while product-level auth, billing, and tenant persistence stay in the API service.
- worker heartbeat and device registry
- persistent lease records with TTL and stale-session cleanup
- allocator filters and sticky reuse
- health score plus draining state
- operator metrics and audit surfaces
- devices discovered
- devices ready
- allocation latency
- attach success rate
- reconnect recoveries
- reconnect exhaustion count
- average session duration
- release cleanup failures
- device failure streak
This repo should stay Android-only and public.
Safe to document here:
- Android pooling model
- ADB/Chrome socket assumptions
- worker-local attached server lifecycle
- hosted allocation and health concepts
Not for this repo:
- private mobile device implementations
- private product internals
- tenant secrets or credential flows