Skip to content

alexandreroman/durable-skies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Durable Skies

CI License

Durable multi-agent drone delivery demo built on Google ADK and Temporal: a fleet of four autonomous drones executes delivery missions under the supervision of LLM-powered agents (Anthropic Claude), with every LLM call and every tool invocation running as a durable Temporal Activity — crashes, restarts, and deploys never lose state mid-mission.

app.mp4

Features

  • Durable agents — every ADK LLM call and tool call is recorded as a Temporal Activity, so agent reasoning is replayed deterministically after any crash.
  • Agents at decision points — a dispatcher agent picks the best drone for each incoming order, and an anomaly handler agent chooses a recovery action when an in-flight incident occurs. The mission itself is a deterministic activity loop.
  • Entity-per-drone orchestration — a FleetWorkflow supervisor routes orders to long-lived per-drone DroneWorkflow entities, each spawning a DeliveryWorkflow child per order. A per-order OrderWorkflow makes every order individually queryable in the Temporal UI.
  • Claude via LiteLLM — the Google ADK talks to Anthropic's Claude models through the LiteLLM adapter: Sonnet for decision-makers, Haiku for the dispatcher's analyst sub-agents.
  • Live operations frontend — a Nuxt 4 dashboard with the fleet map, a per-drone agent panel, and a streaming event log.
  • One-command local stack — a Compose file brings up a Temporal dev-server container (serving both the gRPC frontend and the built-in Web UI) alongside a Redis container used for live drone telemetry, the fleet event log, and the drone availability registry; make targets start the worker, the API, and the frontend.

Prerequisites

  • Docker (or a Compose-compatible runtime such as Podman) for the local stack
  • An ANTHROPIC_API_KEY for Claude

Getting Started

Clone the repo, set your API key, and launch the full stack with Compose:

git clone https://github.com/alexandreroman/durable-skies.git
cd durable-skies

cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY=sk-ant-...

make run   # or: docker-compose up

This brings up Temporal, Redis, the worker, the API, and the Nuxt frontend in one shot.

Open http://localhost:3000 and click Submit Orders to see a drone mission run end-to-end.

Usage

Submit an order programmatically. Valid pickup bases are base-north, base-south, and base-east; valid delivery points are dp-1 through dp-8:

curl -X POST http://localhost:8000/orders \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "order-001",
    "pickup_base_id": "base-north",
    "dropoff_point_id": "dp-1",
    "payload_kg": 1.2,
    "created_at": "2026-04-22T10:00:00Z",
    "status": "pending"
  }'

Inspect workflows and activities in the Temporal UI at http://localhost:8233 — each agent step shows up as a workflow or activity you can replay.

Configuration

Settings are read from environment variables or from a .env file at the project root. All fields have sensible defaults; only ANTHROPIC_API_KEY is required.

Variable Description Default
ANTHROPIC_API_KEY Anthropic API key (required)
TEMPORAL_ADDRESS Temporal frontend host:port localhost:7233
TEMPORAL_NAMESPACE Temporal namespace default
REDIS_URL Redis URL for telemetry, events, availability redis://localhost:6379/0
ANTHROPIC_MODEL Claude model for decision-making agents anthropic/claude-sonnet-4-6
ANTHROPIC_FAST_MODEL Claude model for summarizer sub-agents anthropic/claude-haiku-4-5
API_HOST FastAPI bind address 0.0.0.0
API_PORT FastAPI listen port 8000

The Nuxt frontend reads NUXT_PUBLIC_API_BASE (default http://localhost:8000); set it if you serve the API on a different host.

Development

For iterative work with hot-reload, run the backend and frontend directly on your host against the Compose-managed Temporal and Redis:

make -C backend install   # install Python deps
make infra-up             # start Temporal + Redis only
make dev                  # worker + API + frontend

make dev runs the worker, the API, and the frontend with hot-reload in one shot. You can also run make worker, make api, and make ui in separate terminals if you prefer.

This flow additionally requires Python 3.12+, uv, Node.js 20+, and pnpm on your host.

Architecture

graph TD
    FE[Frontend<br/>map · agent panel · event log]
    API[Backend]
    ORDER[OrderWorkflow<br/>per-order]
    FLEET[FleetWorkflow<br/>dispatcher]
    DRONE[DroneWorkflow<br/>per-drone entity]
    DELIV[DeliveryWorkflow<br/>per-order child]
    DISP[ADK Dispatcher Agent]
    ANOM[ADK Anomaly Agent]
    ACTS[Drone + world<br/>activities]
    TEMPORAL[(Temporal Service)]
    REDIS[(Redis)]
    CLAUDE[(Anthropic API)]

    FE <--> API
    API -->|signal / query| TEMPORAL
    API -->|read telemetry| REDIS
    TEMPORAL --> ORDER
    TEMPORAL --> FLEET
    TEMPORAL --> DRONE
    ORDER -->|signal order| FLEET
    FLEET --> DISP
    FLEET -->|signal| DRONE
    DRONE -->|child workflow| DELIV
    DELIV --> ANOM
    DELIV --> ACTS
    DISP -->|TemporalModel| CLAUDE
    ANOM -->|TemporalModel| CLAUDE
    DRONE -->|availability| REDIS
    ACTS -->|telemetry + events| REDIS
Loading
Module Description
backend Python package with the FastAPI HTTP API, Temporal workflows, activities, and ADK dispatcher + anomaly agents.
frontend Nuxt 4 + Vue 3 + Tailwind 4 dashboard for monitoring the fleet.

Agents

Two ADK agents sit at the decision points of the fleet. Everything else — takeoff, navigation, pickup, dropoff, landing — runs as a deterministic Temporal activity loop with no LLM in the critical path.

Both agents run through the ADK × Temporal integration: every LLM call goes through TemporalModel, so each model invocation is recorded as a Temporal Activity and replayed deterministically after a crash. Each activity carries a human-readable summary (for example Dispatcher · Fleet analyst) so agent steps show up labelled in the Temporal UI.

The tools the agents expose — submit_dispatch and submit_recovery — are pure in-memory writes to ADK session state; they are not wrapped as activities because they carry no side effects. The workflow reads the decision back from session state after the agent run and branches on a validated string.

Dispatcher

Picks the best idle drone for each pending order. Invoked from FleetWorkflow whenever at least one drone is dispatchable (IDLE with battery > 40%). Source: agents/dispatcher.py.

The dispatcher is a SequentialAgent with two stages:

  1. Analysts — a ParallelAgent running two fast-model sub-agents (Haiku by default):
    • fleet_analyst summarizes the pool of idle drones (id, name, home base, battery).
    • order_analyst summarizes the pending order (pickup base, dropoff point, payload weight).
  2. Pickerdispatcher_picker on the main model (Sonnet by default). It receives both analyses as template variables and picks one drone by calling submit_dispatch(drone_id, reasoning).

The workflow reads the choice back from session state under DISPATCH_DECISION_KEY, validates the drone id against the current dispatchable list, and signals DroneWorkflow.assign_order. Any failure — LLM error, invalid id, session hiccup — falls back to a deterministic round-robin picker so orders keep flowing.

Anomaly handler

Picks a recovery action after an in-flight incident. Invoked from DeliveryWorkflow's exception handler when any activity in the mission loop raises (typically battery_critical during flight). Source: agents/anomaly.py.

The anomaly handler is a single main-model Agent (Sonnet by default). Its prompt describes the incident and includes live telemetry — current position, home-base distance, and nearest-base distance — read from Redis through the read_drone_telemetry activity. The agent picks one of three recovery actions by calling submit_recovery(action, reasoning):

Action Behaviour
abort_return_home Fly straight back to the drone's home base. Order fails.
emergency_land_nearest_base Land at the closest base, which may not be home. Order fails.
divert_to_recharge Fly to the nearest base, recharge, then fly home. Order fails.

submit_recovery coerces any unknown action to abort_return_home. If the agent run itself fails, the workflow also defaults to abort_return_home as a safety net, so the drone always has a defined recovery path. Recovery flights are executed through the fly_drone_to_base activity so the drone streams live telemetry on the way rather than teleporting.

License

This project is licensed under the Apache-2.0 License — see LICENSE for details.

About

Durable multi-agent drone delivery demo showcasing the Google ADK × Temporal integration

Topics

Resources

License

Stars

Watchers

Forks

Contributors