Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions python/agents/livekit-adk/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Virtual environments
.venv/
venv/
ENV/
env/

# Environment files (contains keys/secrets)
.env
.env.*


# Python Cache
__pycache__/
*.py[cod]
*$py.class
.pytest_cache/
.mypy_cache/

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Logs
*.log
app/*.log
app/debug*.log

# SQLite Database files
*.db
*.sqlite
app/*.db
43 changes: 43 additions & 0 deletions python/agents/livekit-adk/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Use the official extremely fast UV python image
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder

# Enable bytecode compilation for slightly faster startup times
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy

WORKDIR /app

# Install git since some dependencies are pulled from git repositories
RUN apt-get update && apt-get install -y --no-install-recommends git && rm -rf /var/lib/apt/lists/*

# Install dependencies first to take advantage of Docker layer caching
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-install-project --no-dev

# Copy the application source code
COPY app /app/app
COPY README.md /app/README.md

# Sync the project (finalized setup)
RUN uv sync --frozen --no-dev

# Final running container
FROM python:3.12-slim-bookworm

WORKDIR /app

# Copy the virtual environment and code from the builder stage
COPY --from=builder /app /app

# Expose port 8080 (default for Google Cloud Run)
EXPOSE 8080

# Ensure Python finds our app and travel_booking packages
ENV PYTHONPATH=/app/app
ENV PORT=8080

# Set default environment variables
ENV USE_DATABASE_SESSION=false

# Run the FastAPI server
CMD ["/app/.venv/bin/python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
239 changes: 239 additions & 0 deletions python/agents/livekit-adk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Travel Booking Multi-Agent Assistant (LiveKit ADK)

An enterprise-grade multi-agent voice orchestrator system built on **Google's Agent Development Kit (ADK)** and **LiveKit**. This application establishes a low-latency WebRTC-to-Gemini-Live bridge allowing users to naturally converse with specialized travel agents to search, book, and manage flights and hotels in a single session.

---

## 1. System Architecture & Flow

The system coordinates high-performance WebRTC audio streams, translates them into real-time bidirectional ADK session events, and communicates with Gemini Live API models.

```
┌───────────────────────────────────────┐
│ Client Browser │
└───────────────────────────────────────┘
▲ │
WebRTC │ (Render Audio / │ (User Audio /
DataChannel │ Data Transcription) │ WebRTC Media)
│ ▼
┌──────────────────────┼───────────────────────────────────────────────────────┐
│ │ LiveKit SFU Server │
│ │ │
│ └──────────────┬───────────────▲────────────────────────┘
│ │ │
│ Audio Frames │ │ Audio Frames
│ (Gemini -> LK) │ │ (LK -> Gemini)
│ ▼ │
│ ┌────────────────────────────────────────┴────────────┐ │
│ │ FastAPI Application Server │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ app/livekit_bridge.py │ │ │
│ │ │ - Converts 48kHz LK Audio -> 16kHz PCM │ │ │
│ │ │ - Feeds LiveRequestQueue │ │ │
│ │ └──────────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────┐ │ │
│ │ │ LiveRequestQueue │ │ │
│ │ └──────────┬──────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────┐ │ │
│ │ │ ADK Runner │ │ │
│ │ └──────────┬──────────┘ │ │
│ └──────────────────────────┼──────────────────────────┘ │
└───────────────────────────────────────┼──────────────────────────────────────┘
│ Bidirectional WebSockets (v1alpha)
┌───────────────────────────────────────┐
│ Gemini Live API │
│ (gemini-live-2.5-flash) │
└───────────────────────────────────────┘
```

### Protocol Lifecycle:
1. **Session Establishment**: Client requests a token from `/token`, triggering the background instantiation of the `LiveKitGeminiBridge`.
2. **WebRTC Join**: The client browser joins the LiveKit room. The `LiveKitGeminiBridge` also connects a virtual audio participant into the room.
3. **Upstream Pipeline (User $\rightarrow$ Gemini)**:
- Client streams user audio to the LiveKit Room using **WebRTC**.
- `LiveKitGeminiBridge` subscribes to the track, extracts the 48kHz frames, downsamples to **16kHz mono PCM**, and pushes them to the ADK `LiveRequestQueue`.
- The ADK `Runner` streams the queue content to Gemini Live API via an underlying **secure WebSocket connection (`bidiGenerateContent` protocol)**.
4. **Downstream Pipeline (Gemini $\rightarrow$ User)**:
- Gemini responds in real-time with audio buffers and textual transcriptions via the secure WebSocket.
- ADK `Runner` intercepts the events and passes them to the bridge.
- The bridge pushes the audio (resampled to 24kHz) back into LiveKit via the LocalAudioTrack, and publishes transcription text through WebRTC **DataChannels**.

---

## 2. The Booking Agents (Multi-Agent Orchestrator)

Inside the `app/travel_booking/` package, we implement a **three-tier hierarchical multi-agent orchestrator**:

```
┌──────────────────────┐
│ session_orchestrator │
│ (Central Router) │
└──────────┬───────────┘
┌──────────────────┴──────────────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ FlightBookingAgent │◀─────────────▶│ HotelBookingAgent │
│ (Flight search/book)│ A2A Transfer│ (Hotel search/book) │
└─────────────────────┘ └─────────────────────┘
```

1. **Session Orchestrator** (`app/travel_booking/agent.py`):
- The root agent. Listens to the user's intent and delegates control to the specialist sub-agents (`FlightBookingAgent` or `HotelBookingAgent`) via ADK's native agent-routing tool calls.
2. **FlightBookingAgent** (`app/travel_booking/agents/flight_booking.py`):
- Specialized in travel search and reservations.
- Equipped with 3 mock tools: `search_flights`, `book_flight`, `cancel_flight`.
- Configured with `HotelBookingAgent` as a sub-agent to enable a direct, silent handoff when the user switches context to hotel bookings.
3. **HotelBookingAgent** (`app/travel_booking/agents/hotel_booking.py`):
- Specialized in lodging discovery and booking.
- Equipped with 3 mock tools: `search_hotels`, `book_hotel`, `cancel_hotel`.

---

## 3. LiveKit Bridge & Configuration Overview

### `app/livekit_bridge.py` Overview
This class acts as the low-latency audio converter and routing engine.
- **Audio Buffering & Downsampling**: Standardizes LiveKit's high-fidelity audio stream (downsamples 48kHz stereo to 16kHz mono PCM) and sends it in small, optimized 20ms buffers (640 bytes) to reduce WebSocket latency.
- **Realtime Event Mapping**: Listens to `runner.run_live` event generators, translates model text outputs and prints them to the server console, and publishes data events onto WebRTC DataChannels.

### Runner Configuration (`app/main.py`)
- The `Runner` is configured with standard `InMemorySessionService` or `DatabaseSessionService` to persist dialogue histories.
- Integrated with a custom **`SessionResumptionIsolationPlugin`**: Clears the transparency resumption handles when transferring between sub-agents, preventing key-based Gemini API errors during sub-agent handoffs.

```python
runner = Runner(
app_name="livekit-adk",
agent=agent.root_agent,
session_service=session_service,
auto_create_session=True,
plugins=[SessionResumptionIsolationPlugin()]
)
```

### Run Configuration (`app/livekit_bridge.py`)
Configured specifically to handle high-performance native audio models over WebRTC:
```python
run_config = RunConfig(
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"],
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
session_resumption=types.SessionResumptionConfig(),
enable_affective_dialog=True
)
```
- **`response_modalities=["AUDIO"]`**: Crucial for native audio Gemini models; enforces direct-to-audio responses to ensure natural voice cadence.
- **`output_audio_transcription`**: Tells Gemini to send text transcripts along with the audio stream so the bridge can display them on the UI.

---

## 4. Configuration & Run Instructions

### 1. Environment Setup (`app/.env`)
Create a file named `app/.env` inside the project. Configure the variables:

```bash
# Backend selection (Set to FALSE to use key-based Gemini API)
GOOGLE_GENAI_USE_VERTEXAI=FALSE

# For Gemini Live API
GOOGLE_API_KEY=your_google_gemini_api_key_here
GEMINI_API_KEY=your_google_gemini_api_key_here

# Model selection
DEMO_AGENT_MODEL="gemini-live-2.5-flash-native-audio"

# LiveKit Settings (Local development values)
USE_LIVEKIT=true
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret
```

### 2. Installing & Running LiveKit (macOS)

For local development, you need a running LiveKit server instance. On macOS, install and run it using Homebrew:

```bash
# Install LiveKit Server
brew install livekit

# Start the server in development mode
livekit-server --dev
```

The `--dev` flag automatically starts the server on `localhost:7880` using the default credentials:
- **API Key**: `devkey`
- **API Secret**: `secret`

*(These credentials match the default `.env` template above.)*

### 3. Launching the Application

Ensure you are running commands using the project's virtual environment.

**Step 1: Sync dependencies**
```bash
uv sync
```

**Step 2: Start the Dev Server**
Navigate into the `app` folder and run:
```bash
uv run --project .. python3 -m uvicorn main:app --reload
```

The application starts a FastAPI server listening at `http://127.0.0.1:8000/static/livekit`. Open this URL in your browser to interact with the voice assistant!

---

### 4. Running Automated Tests (Pytest)

We maintain a comprehensive automated test suite using `pytest` to validate mock travel tools, agent configurations, and sub-agent orchestration hierarchies.

To execute the test suite:
```bash
# Rebuild / sync local environment dependencies
uv sync

# Run tests
.venv/bin/pytest -v
```

---

### 5. Deploying to Google Cloud Run

The application is optimized for serverless container deployment on Google Cloud Run using **Google Cloud Build** and **Artifact Registry**.

We have provided a fully interactive, automated deployment helper script:

```bash
# Make the deploy script executable (if not already)
chmod +x deploy.sh

# Run the deployment script
./deploy.sh
```

The script will automatically:
1. Prompt you to select or input your Google Cloud Project ID and deployment Region.
2. Enable required Google Cloud APIs (Artifact Registry, Cloud Build, Cloud Run, Secret Manager).
3. Build and tag the application image securely on Google Cloud Build.
4. Deploy the image to a managed Google Cloud Run service, setting the necessary environment variables (`GOOGLE_API_KEY`, `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`).

---

## 5. Authors

- **Kishore Jagannath (kishorerj)**


Loading