Skip to content

Commit b7fd3a8

Browse files
committed
add fastapi files
1 parent 00b3869 commit b7fd3a8

41 files changed

Lines changed: 2229 additions & 3 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

server-fastapi/README.md

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
## ElatoAI: Realtime Voice AI Models on FastAPI
2+
3+
`server-fastapi` is the simplest self-hosted Elato backend for people who want a normal Python server instead of an edge runtime.
4+
5+
Use this if you want:
6+
7+
- a FastAPI server you can run on your own machine or VM
8+
- a classic `STT -> LLM -> TTS` voice pipeline
9+
- a smaller provider surface that is easy to understand
10+
- the same ESP32 transport shape as the rest of Elato
11+
12+
If you are new to the project, read these first:
13+
14+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md`
15+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md`
16+
17+
## The Simple Provider Set
18+
19+
To keep onboarding straightforward, the classic FastAPI route is centered around a small set of providers.
20+
21+
### LLM
22+
23+
- `openai`
24+
- `claude`
25+
- `gemini`
26+
- `grok`
27+
28+
### STT
29+
30+
- `deepgram`
31+
- `whisper`
32+
33+
### TTS
34+
35+
- `elevenlabs`
36+
- `cartesia`
37+
- `deepgram`
38+
- `openai`
39+
40+
The code still uses the `models/llm`, `models/stt`, and `models/tts` layout, but the active registry is intentionally trimmed so the default experience stays simple.
41+
42+
## Default Setup
43+
44+
The default classic route is:
45+
46+
- STT: `deepgram`
47+
- LLM: `openai`
48+
- TTS: `elevenlabs`
49+
50+
That gives people one obvious path to get running before they start swapping providers.
51+
52+
## Project Layout
53+
54+
```text
55+
server-fastapi/
56+
├── bot.py
57+
├── classic_route.py
58+
├── esp32_transport.py
59+
├── server.py
60+
├── env.example
61+
└── models/
62+
├── llm/
63+
├── stt/
64+
└── tts/
65+
```
66+
67+
## How The FastAPI Server Fits Into Elato
68+
69+
Elato has three backend options right now:
70+
71+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno`
72+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare`
73+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi`
74+
75+
A clean way to think about them is:
76+
77+
- `Deno`: edge-first, mature provider integrations
78+
- `Cloudflare`: Workers + Durable Objects + Workers AI
79+
- `FastAPI`: normal Python server, easy to self-host, easy to reason about
80+
81+
## Quick Start
82+
83+
### 1. Create or activate your Python environment
84+
85+
Use whatever you prefer. If you already use `uv`, that is a good default.
86+
87+
### 2. Install dependencies
88+
89+
This repo uses `pyproject.toml`, so install from that environment rather than a `requirements.txt` file.
90+
91+
With `uv`:
92+
93+
```bash
94+
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
95+
uv sync
96+
```
97+
98+
Or with plain pip in your venv:
99+
100+
```bash
101+
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
102+
pip install -e .
103+
```
104+
105+
### 3. Create your env file
106+
107+
Copy the example values from:
108+
109+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/env.example`
110+
111+
Minimum example for the default route:
112+
113+
```env
114+
DEEPGRAM_API_KEY=your_deepgram_api_key
115+
OPENAI_API_KEY=your_openai_api_key
116+
ELEVENLABS_API_KEY=your_elevenlabs_api_key
117+
118+
CURRENT_VOICE_ROUTE=classic
119+
CLASSIC_STT_PROVIDER=deepgram
120+
CLASSIC_LLM_PROVIDER=openai
121+
CLASSIC_TTS_PROVIDER=elevenlabs
122+
123+
ESP32_INPUT_SAMPLE_RATE=16000
124+
BROWSER_INPUT_SAMPLE_RATE=16000
125+
AUDIO_OUTPUT_SAMPLE_RATE=24000
126+
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
127+
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000
128+
129+
ALLOWED_ORIGINS=*
130+
HOST=0.0.0.0
131+
PORT=7860
132+
```
133+
134+
### 4. Run the server
135+
136+
If you use `uv`:
137+
138+
```bash
139+
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
140+
uv run server.py
141+
```
142+
143+
If you use your activated venv directly:
144+
145+
```bash
146+
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
147+
python server.py
148+
```
149+
150+
### 5. Point your ESP32 at the FastAPI backend
151+
152+
Update the firmware config so your hardware connects to this server instead of the Deno or Cloudflare backend.
153+
154+
The ESP32 route is:
155+
156+
```text
157+
/ws/esp32
158+
```
159+
160+
For browser or Next.js testing, the server also exposes:
161+
162+
- `/ws/browser`
163+
- `/ws/nextjs`
164+
165+
## How Provider Selection Works
166+
167+
The classic route reads three env vars:
168+
169+
- `CLASSIC_STT_PROVIDER`
170+
- `CLASSIC_LLM_PROVIDER`
171+
- `CLASSIC_TTS_PROVIDER`
172+
173+
So changing providers is just an env change.
174+
175+
Examples:
176+
177+
### OpenAI + Deepgram + ElevenLabs
178+
179+
```env
180+
CLASSIC_STT_PROVIDER=deepgram
181+
CLASSIC_LLM_PROVIDER=openai
182+
CLASSIC_TTS_PROVIDER=elevenlabs
183+
```
184+
185+
### Whisper + Claude + Cartesia
186+
187+
```env
188+
CLASSIC_STT_PROVIDER=whisper
189+
CLASSIC_LLM_PROVIDER=claude
190+
CLASSIC_TTS_PROVIDER=cartesia
191+
```
192+
193+
### Deepgram + Gemini + OpenAI TTS
194+
195+
```env
196+
CLASSIC_STT_PROVIDER=deepgram
197+
CLASSIC_LLM_PROVIDER=gemini
198+
CLASSIC_TTS_PROVIDER=openai
199+
```
200+
201+
## Unified Experience Across Elato
202+
203+
A simple way to keep the product understandable is:
204+
205+
- keep the Next.js frontend focused on character creation and device management
206+
- keep the ESP32 firmware focused on one transport protocol
207+
- let users choose one backend runtime:
208+
- Deno
209+
- Cloudflare
210+
- FastAPI
211+
- inside each backend, expose the same conceptual knobs:
212+
- `STT`
213+
- `LLM`
214+
- `TTS`
215+
216+
That means the hardware story stays stable:
217+
218+
- one firmware
219+
- one websocket-style mental model
220+
- three server deployment choices
221+
222+
The cleanest unification strategy is not “every backend supports every provider.”
223+
It is:
224+
225+
- every backend should expose the same categories
226+
- each backend should have one recommended default stack
227+
- advanced users can swap providers later
228+
229+
## Recommended Defaults
230+
231+
If you want a simple opinionated experience for users, keep one default combo per backend.
232+
233+
Suggested defaults:
234+
235+
- `Deno`: OpenAI realtime
236+
- `Cloudflare`: Workers AI STT/TTS + OpenAI LLM
237+
- `FastAPI`: Deepgram + OpenAI + ElevenLabs
238+
239+
That gives users one obvious starting point without taking away flexibility.
240+
241+
## Important Files
242+
243+
If you want to change the FastAPI backend, start here:
244+
245+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/server.py`
246+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/classic_route.py`
247+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/esp32_transport.py`
248+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/llm/__init__.py`
249+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/stt/__init__.py`
250+
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/tts/__init__.py`
251+
252+
## Current Notes
253+
254+
- The filesystem still contains many scaffolded provider modules from the earlier broader experiment.
255+
- The active provider registry is now intentionally much smaller.
256+
- That means the codebase stays extensible, but the user-facing default path stays simple.
1.91 KB
Binary file not shown.

server-fastapi/classic_route.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""Classic STT -> LLM -> TTS pipeline builder."""
2+
3+
from __future__ import annotations
4+
5+
import os
6+
7+
from character_prompt import LANGUAGE_LEARNING_PAL_PROMPT
8+
from loguru import logger
9+
from models.llm import create_llm_service
10+
from models.stt import create_stt_service
11+
from models.tts import create_tts_service
12+
from pipecat.audio.vad.silero import SileroVADAnalyzer
13+
from pipecat.audio.vad.vad_analyzer import VADParams
14+
from pipecat.processors.aggregators.llm_context import LLMContext
15+
from pipecat.processors.aggregators.llm_response_universal import (
16+
LLMContextAggregatorPair,
17+
LLMUserAggregatorParams,
18+
)
19+
20+
21+
def build_classic_route(input_processor, context: LLMContext):
22+
stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram")
23+
llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai")
24+
tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs")
25+
26+
logger.info(
27+
"Building classic route with stt={} llm={} tts={}",
28+
stt_provider,
29+
llm_provider,
30+
tts_provider,
31+
)
32+
33+
stt = create_stt_service(stt_provider)
34+
llm = create_llm_service(
35+
llm_provider,
36+
system_instruction=LANGUAGE_LEARNING_PAL_PROMPT,
37+
)
38+
tts = create_tts_service(tts_provider)
39+
40+
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
41+
context,
42+
user_params=LLMUserAggregatorParams(
43+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=1))
44+
),
45+
)
46+
47+
processors = [
48+
input_processor,
49+
stt,
50+
user_aggregator,
51+
llm,
52+
tts,
53+
]
54+
55+
return processors, assistant_aggregator

server-fastapi/env.example

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
DEEPGRAM_API_KEY=your_deepgram_api_key
2+
OPENAI_API_KEY=your_openai_api_key
3+
ANTHROPIC_API_KEY=your_anthropic_api_key
4+
GEMINI_API_KEY=your_gemini_api_key
5+
XAI_API_KEY=your_xai_api_key
6+
ELEVENLABS_API_KEY=your_elevenlabs_api_key
7+
CARTESIA_API_KEY=your_cartesia_api_key
8+
9+
# Classic route providers
10+
CURRENT_VOICE_ROUTE=classic
11+
CLASSIC_STT_PROVIDER=deepgram
12+
CLASSIC_LLM_PROVIDER=openai
13+
CLASSIC_TTS_PROVIDER=elevenlabs
14+
15+
# Transport and pipeline sample rates
16+
ESP32_INPUT_SAMPLE_RATE=16000
17+
BROWSER_INPUT_SAMPLE_RATE=16000
18+
AUDIO_OUTPUT_SAMPLE_RATE=24000
19+
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
20+
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000
21+
22+
# Browser / Next.js access
23+
ALLOWED_ORIGINS=*
24+
25+
# WebSocket server settings
26+
HOST=0.0.0.0
27+
PORT=7860
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
"""LLM provider registry."""
2+
3+
from __future__ import annotations
4+
5+
from models._provider_loader import load_provider_factory
6+
7+
LLM_REGISTRY = {
8+
"claude": "models.llm.anthropic",
9+
"anthropic": "models.llm.anthropic",
10+
"gemini": "models.llm.google_gemini",
11+
"google_gemini": "models.llm.google_gemini",
12+
"google_vertex_ai": "models.llm.google_vertex_ai",
13+
"grok": "models.llm.grok",
14+
"openai": "models.llm.openai",
15+
}
16+
17+
18+
def create_llm_service(provider_name: str, **kwargs):
19+
factory = load_provider_factory(LLM_REGISTRY, provider_name, "LLM")
20+
return factory(**kwargs)
797 Bytes
Binary file not shown.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
"""STT provider registry."""
2+
3+
from __future__ import annotations
4+
5+
from models._provider_loader import load_provider_factory
6+
7+
STT_REGISTRY = {
8+
"deepgram": "models.stt.deepgram",
9+
"openai": "models.stt.openai",
10+
"whisper": "models.stt.whisper",
11+
}
12+
13+
14+
def create_stt_service(provider_name: str, **kwargs):
15+
factory = load_provider_factory(STT_REGISTRY, provider_name, "STT")
16+
return factory(**kwargs)
677 Bytes
Binary file not shown.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
"""TTS provider registry."""
2+
3+
from __future__ import annotations
4+
5+
from models._provider_loader import load_provider_factory
6+
7+
TTS_REGISTRY = {
8+
"cartesia": "models.tts.cartesia",
9+
"deepgram": "models.tts.deepgram",
10+
"elevenlabs": "models.tts.elevenlabs",
11+
"openai": "models.tts.openai",
12+
}
13+
14+
15+
def create_tts_service(provider_name: str, **kwargs):
16+
factory = load_provider_factory(TTS_REGISTRY, provider_name, "TTS")
17+
return factory(**kwargs)
719 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)