Instructions for AI coding agents working in this repo.
CLAUDE.mdis a symlink to this file for tools that look for the older name.
Conference demo: Temporal durable execution + Google ADK multi-agent reasoning, visualized as an ice cream delivery fleet on the Las Vegas Strip.
./run.sh # starts Temporal dev server + worker process + server processrun.sh starts three processes: Temporal dev server, worker (python -m agent_fleet.worker),
and FastAPI server (python -m agent_fleet.server). No manual Temporal setup needed.
- Two separate processes: FastAPI server (
server.py) queries Temporal workflows for state and sends signals only — no workers, no FleetState reads. Workers run in a separate process (worker.py) with live/mock mode selection at startup (GOOGLE_API_KEYset → live, not set → mock). - Workflows own state (
workflows.py):MeltdownDemoWorkflowowns driver positions, order assignments, and disconnect status. BuildsDriverSnapshots and passes to activities as inputs. Capacity guardrail: if ADK assigns to a full (3 orders) or disconnected driver, auto-reassigns to next available. Orders assigned while Fleet Agent is offline getdegraded=Trueflag.DriverRouteWorkflowis a per-driver child workflow — batch-picks up to 3 orders at Ziggy's, delivers sequentially (hotel A → hotel B → ...), then returns. Tracks status, is_disconnected, is_recovering, path_history, and current_orders. Disconnect uses Temporal-native retry: activities check FleetState for disconnect, fail if disconnected, Temporal retries with backoff until reconnected. Driver completes delivery, stays at hotel, can't report back until reconnected. On reconnect,sync_driver_positionactivity reads actual position from FleetState — no teleporting. Completed deliveries are not repeated; batch continues from next pending order. HITL hold pattern: this is operator-in-the-loop, not agent-in-the-loop — the change is initiated externally (operator submits a customer change via REST) and a human supervisor approves it. The ADK agents never see the change; the gate lives in the workflow, not in any agent tool (contrast: anask_user-style@function_toolwhere the LLM itself pauses for clarification). When the change is submitted, parent signals child withupdate_pending— driver navigates to hotel but holds before delivering (awaiting_updatestatus,wait_condition). On approval, parent signalsresolve_updatewith the decision: cancel → skip delivery, address_change → reroute to new destination, release → deliver normally. Twowait_conditionpatterns: parent waits for human, child waits for parent. For pending/batched orders, changes apply directly without hold. Customer changes process serially in the parent (_drain_pending_signals) — it's simpler and matches the demo flow (changes submitted one at a time). The child's HITL state is a per-order dict (_pending_holds: dict[str, PendingHold]):update_pendingcreates an entry,resolve_updatefills in the decision for that specific order, and the delivery loop waits on the hold for the order it's currently processing. No single-slot overwrite — two changes for different orders on the same driver each get their own slot.deliver_ordernow returnssuccess=Falsewhen a cancel wins the race, so the workflow skips theorder_deliveredparent signal for cancelled orders. The child's HITL hold also escapes onself._stopso demo shutdown can't leave a parked child hanging the parent'sawait handlejoin.OrderGenerationWorkflowis a child workflow that generates orders on a randomized timer and signals the parent. Parent handles assignment. - Server reads FleetState (
server.py): WebSocket data comes fromfleet.snapshot()(SQLite). Server also writes disconnect/reconnect state directly. Temporal queries used for structural state during development — FleetState is the display authority. - Activities are pure (
activities.py): receive all decision data as inputs, never read FleetState for logic.@activity.defnwith noname=override (function names are activity names). - FleetState (
simulation.py): SQLite WAL-backed UI projection. Backed byfleet_state.dbfor cross-process sharing — activities in the worker write positions/statuses, server reads for the frontend WebSocket. In production this would be Redis or Postgres. - 3-queue workers (
worker.py): workflows + local activities, delivery, agents.GoogleAdkPluginis on both workflow and agents workers (sandbox + determinism on workflow side,invoke_modelactivity on agents side). Agents use the upstreamTemporalModelwithsummary_fn=_build_summary—_build_summaryinagents.pygenerates context-aware summaries (agent name, order, phase) shown in the Temporal UI per invoke_model activity._activity_tool.pybuilds its own dynamic summaries for tool-call activities from the bound arguments.publish_agent_eventandpublish_agent_events_batchare registered on the workflow worker for local activity execution (UI projection with minimal history). - ADK agents (
agents.py): Fleet Agent + Customer Agent (parallel) → Dispatch Agent (sequential). Live path runs ADK inline in the workflow via_run_adk_assignment(). No fallback to mock — if an activity fails, Temporal retries. Fleet Agent tools fail fast when disconnected (2 attempts), error returned to LLM via_activity_tool.pycatch — Dispatch Agent assigns with available data but orders are flagged asdegraded. Workflow publishes short summary events to FleetState via batched local activity after ADK completes (summary fromoutput_keyfields). - Mock mode (
agent_fleet/mock/): completely separate folder with its ownactivities.pyandworker.py. Live code has zero mock awareness. Decision at startup:GOOGLE_API_KEYset → live workers, not set → mock workers. Mock activities usename=overrides to match live activity names. - Server (
server.py): disconnect/reconnect endpoints write to FleetState (SQLite) for immediate frontend display AND signal Temporal workflows for durable state. - Frontend (
frontend/index.html): single-file SPA with Leaflet map, WebSocket state feed, agent reasoning panels. - PydanticPayloadConverter on
Client.connectin both server and worker forLlmResponseserialization.
- Dataclass models for all Temporal payloads (
models.py) - Activities and workflows in separate files
- Mock mode in
agent_fleet/mock/whenGOOGLE_API_KEYis not set - Two API keys required:
GOOGLE_API_KEY(Gemini, Generative Language API) andGOOGLE_MAPS_API_KEY(Directions API) — cannot be combined DEFAULT_MODELdefaults togemini-2.5-flash(swappable via env)- Random order generation from 3 Las Vegas venues (
locations.py) - Drivers use letter IDs:
driver-athroughdriver-e, displayed asDriver-Aetc. - Ice cream shop is "Ziggy's Ice Cream" (
WAREHOUSE_LABELinlocations.py) - Max 50 orders per demo run, drivers batch up to 3 orders (
DRIVER_CAPACITY)
Dependencies are managed with uv — uv sync --all-extras
creates .venv/ and installs runtime + dev deps. uv run <cmd> runs in that env.
uv sync --all-extras # install / refresh deps (creates .venv/)
uv run ruff check . # lint
uv run ruff format . # format
uv run pytest # run tests
make lint # ruff check + format check (via uv)
make fmt # ruff format (via uv)
make test # pytest (via uv)
make run # start the demo