pipecat-sarvam-azure-starter

A production-ready Pipecat voice-agent server: Sarvam STT → Azure OpenAI LLM → Sarvam TTS. Speaks 11 Indic languages + Indian English natively.

# 60 seconds to a working Hindi voice bot:
git clone https://github.com/dpkdhingra91/pipecat-sarvam-azure-starter
cd pipecat-sarvam-azure-starter
cp .env.example .env  &&  vi .env   # fill SARVAM_API_KEY + AZURE_OPENAI_*
docker compose up

# Open http://localhost:7860 → click Start → speak Hindi.

That's it. No JSON parsing, no protocol code, no torch download.

Why this exists

Sarvam's Indic STT/TTS is the best you can get for Hindi, Tamil, Telugu, Bengali, etc. — natural-sounding, low-latency, code-switching-aware. Azure OpenAI gives you GPT-4o-class LLMs with a relaxed RAI policy you can configure for spontaneous speech. Pipecat ties them together.

But: nobody has open-sourced this combination as a working starter. The Pipecat docs show Daily/Cartesia. Sarvam's docs show curl examples. You spend 2-3 days wiring up /connect handshake, turn-gate state machine, Azure timeout config, RTVI event routing, audio sample-rate alignment.

This repo is that work, finished and sanitized. Clone, drop in your API keys, talk to a Hindi voice bot in 60 seconds.

Architecture

  Browser                 FastAPI                Pipecat pipeline
  ┌─────────┐  POST       ┌─────────────┐       ┌──────────────────────────────┐
  │ index   │ /connect ──▶│ /connect    │       │ ws_transport.input()         │
  │ .html   │             │ /store-     │       │   ↓                          │
  │         │             │  context    │       │ Sarvam STT (16 kHz mono PCM) │
  │ @pipecat│             │ /health     │       │   ↓                          │
  │ /client │ WSS ───────▶│ /ws         │ ────▶ │ user_aggregator              │
  │ -js     │             └─────────────┘       │   ↓                          │
  │         │                                   │ ResilientAzureLLMService     │
  │         │ ◀───── 24 kHz PCM audio ──────────│   ↓ (12s timeout, 4 retries) │
  │         │ ◀───── RTVI envelopes (JSON) ─────│ Sarvam TTS (24 kHz mono PCM) │
  └─────────┘                                   │   ↓                          │
                                                │ ws_transport.output()        │
                                                │   ↓                          │
                                                │ BotSpeakingObserver ─▶ Gate  │
                                                └──────────────────────────────┘

A separate TurnGate state machine ensures the user can only speak when: (a) the client says its mic is ready, and (b) the client says the bot's audio has finished playing, and (c) we're not currently in a bot turn.

No echo cancellation problems. No "user interrupts bot before bot finishes" race.

Quickstart

1. Get API keys

Service	Where	Notes
Sarvam	https://dashboard.sarvam.ai → API Keys	Free tier covers a few hours of conversation
Azure OpenAI	Azure portal → OpenAI resource → Keys + Endpoint	Need a deployment name (e.g. `gpt-4o-mini`)

2. Configure

cp .env.example .env
# Edit .env — set SARVAM_API_KEY, AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT.

3. Run

# Docker (recommended)
docker compose up

# Or directly with Python
pip install -e .
python -m server.main

Open http://localhost:7860 — the included browser client picks a Hindi tutor persona by default. Click Start, allow mic access, talk.

4. Try the other examples

curl -s -X POST http://localhost:7860/connect \
  -H 'Content-Type: application/json' \
  -d @examples/english-support.json | jq

Examples

File	Persona	Language	What it does
`hindi-tutor.json`	tutor	🇮🇳 Hindi	Conversational Hindi practice — corrects gently by modeling correct phrasing
`english-support.json`	support	🇮🇳 Indian English	SaaS customer-support agent with ticket-logging fallback
`tamil-journal.json`	journal	🇮🇳 Tamil	Voice journal companion — listens warmly, asks gentle follow-ups

Add your own persona in 3 lines of code — see examples/README.md.

What's included

Component	What	Source
Pipecat pipeline	Sarvam STT → Azure LLM → Sarvam TTS, with turn-gate orchestration	`server/bot.py`
ResilientAzureLLMService	Azure OpenAI client with bounded timeout + 4 retries	`server/bot.py`
TurnGate state machine	Server-authoritative turn control — no echo, no race conditions	`server/bot.py`
BotSpeakingObserver	Drives the turn gate from Pipecat's frame stream	`server/bot.py`
FastAPI server	`/connect`, `/store-context`, `/ws`, `/health` endpoints	`server/main.py`
Voice mapping	11 Indic languages + Indian English, per-language voice override via env	`server/voice_config.py`
System prompt builder	3 example personas (tutor, support, journal) + extension point	`server/system_prompts.py`
Browser client	Plain HTML + ES modules, uses `@pipecat-ai/client-js` from CDN	`client/index.html`
Reconnect support	`POST /store-context` → resume conversation on fresh WebSocket	`server/main.py`
Docker + compose	One-command deploy	`Dockerfile`

Languages supported

Out of the box: English (Indian), Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, Gujarati, Marathi, Punjabi, Odia.

Per-language voice override via TTS_VOICE_<CODE> env var. The default voice is shubh (Sarvam's friendly default v3 speaker) — swap to anushka, karun, etc. for variety.

TTS_VOICE_HI=anushka
TTS_VOICE_TA=karun

Service matrix

If you want to swap providers later:

Component	Default	Swap candidates	Code touch
STT	Sarvam saaras:v3	Deepgram, Google, Whisper, AssemblyAI, Azure Speech	One line in `bot.py`
LLM	Azure OpenAI	OpenAI, Anthropic, Google Gemini, Together, vLLM	One line + import
TTS	Sarvam bulbul:v3	Cartesia, ElevenLabs, Azure Speech, OpenAI TTS	One line in `bot.py`
Transport	FastAPI WebSocket	Daily WebRTC, LiveKit, Twilio	Substantial — different framework

Pipecat has services for all of these. The Sarvam + Azure combo is what's wired here because that's what's missing from the OSS ecosystem.

Resilience built in

Failure mode	What this repo does
Azure OpenAI stalls (>12s)	First-token timeout fires, SDK retries up to 4x with exponential backoff
Azure RAI false-positive content filter	Drop in `pipecat-content-filter-fallback` — 1 line
Sarvam TTS silently returns no audio	Drop in `pipecat-outbound-audio-counter` — logs `[tts_silent_fail]`
Client WS drops mid-conversation	`POST /store-context` saves messages, reconnect via `/connect?sid=<hex>` resumes
User starts speaking over the bot	TurnGate keeps user lane closed until client confirms audio drained
Client never sends `client:playback_drained`	Fallback timer (8s default) force-opens the gate

Configuration reference

All optional, all have sane defaults:

# Required
SARVAM_API_KEY=
AZURE_OPENAI_KEY=
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini

# Server
PORT=7860
LOG_LEVEL=INFO
CORS_ORIGINS=*

# Azure LLM resilience
LLM_REQUEST_TIMEOUT_S=12
LLM_CONNECT_TIMEOUT_S=3
LLM_MAX_RETRIES=4

# Turn-gate timing
POST_PLAYBACK_GRACE_MS=150
POST_PLAYBACK_FALLBACK_S=8.0

# Sarvam model overrides
SARVAM_STT_MODEL=saaras:v3
SARVAM_TTS_MODEL=bulbul:v3

# Per-language voice overrides
TTS_VOICE_HI=anushka
TTS_VOICE_TA=karun
# ... TTS_VOICE_TE, TTS_VOICE_KN, etc.

Why this vs. alternatives

Option	Setup time	Indic-language support	Vendor lock-in	Customization
This repo	60 sec to docker run	★★★ Native, all 11 Indic langs	None (swap providers in 1 line)	Full source, MIT
Vapi	5 min to dashboard	★ English mostly, limited Indic	Heavy — closed source, hosted only	Limited (templates only)
Retell AI	5 min to dashboard	★ English mostly	Heavy — closed source, hosted only	Limited
Raw Pipecat + write your own	2–3 days	Depends on your STT/TTS choice	None	Full
LiveKit Agents	30 min	★★ Via Cartesia/Deepgram	Low	Full but bigger framework

The closed-source platforms (Vapi, Retell) are great if you want a hosted dashboard and don't care about source. This repo is for the case where you're going to deploy this on your own VM with your own keys and your own data flow — and you want to read every line of pipeline code.

Related projects

I extracted a few specialized pieces of this codebase into their own repos. Mix and match:

🔌 voice-agent-qa — Python client to drive Pipecat servers programmatically. Use it for nightly smoke tests of this starter.
🛡️ pipecat-content-filter-fallback — Catches Azure OpenAI RAI false positives and replaces them with a fallback turn.
💾 pipecat-transcript-checkpoint — Per-turn transcript persistence. Survives dropped calls.
📡 pipecat-ws-protocol-docs — Independent reference for the Pipecat WebSocket protocol (for client implementers in Python/Go/Rust).
🎙️ pipecat-bot-speaking-observer — Standalone version of the turn-gate observer.
📊 pipecat-outbound-audio-counter — TTS silent-failure detector.

Roadmap

Daily WebRTC transport variant (for browser-without-ws-proxy deploys)
Per-turn cost telemetry (token + audio second accounting)
Optional Whisper post-pass for higher-quality transcripts
Test suite via voice-agent-qa
Cartesia + OpenAI variant (pipecat-cartesia-openai-starter)

Star + watch if any of these matter to you — PRs welcome.

License

MIT — see LICENSE.

Credits

Pipecat — the framework that makes all of this possible.
Sarvam AI — for the only Indic STT/TTS that actually sounds natural.
Built and battle-tested in production at AI Interview Agents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pipecat-sarvam-azure-starter

Why this exists

Architecture

Quickstart

1. Get API keys

2. Configure

3. Run

4. Try the other examples

Examples

What's included

Languages supported

Service matrix

Resilience built in

Configuration reference

Why this vs. alternatives

Related projects

Roadmap

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
client		client
examples		examples
server		server
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

pipecat-sarvam-azure-starter

Why this exists

Architecture

Quickstart

1. Get API keys

2. Configure

3. Run

4. Try the other examples

Examples

What's included

Languages supported

Service matrix

Resilience built in

Configuration reference

Why this vs. alternatives

Related projects

Roadmap

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages