AI-powered voice intake system that automates client qualification through natural conversation. Built with a hybrid Python + .NET architecture for enterprise scalability.
Live Demo: https://voicebot-frontend-hnha.onrender.com (may take ~30s to wake up on first load)
VoiceBot conducts real-time voice conversations to collect structured client information (name, email, phone, project details) before human sales involvement. Think of it as an AI receptionist that:
- Speaks naturally in English and Polish
- Extracts and validates data in real-time
- Handles interruptions and corrections gracefully
- Stores structured leads in the backend database
flowchart LR
FE[Next.js Frontend] <-->|WebSocket Audio/JSON| PY[Python AI FastAPI]
PY <-->|RabbitMQ Events| NET[.NET Backend ASP.NET]
PY --> Redis[(Redis Sessions)]
NET --> PG[(PostgreSQL Leads)]
Why Hybrid?
- Python excels at AI/ML (STT, LLM, TTS) with rich ecosystem
- .NET provides enterprise-grade APIs, auth, and data management
- Best of both worlds without compromise
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, TypeScript, Tailwind, shadcn/ui |
| AI Layer | Python 3.13, FastAPI, OpenAI (gpt-4o-transcribe, gpt-4o-mini), ElevenLabs TTS |
| Backend | .NET 10, ASP.NET Core, EF Core, MassTransit |
| Infrastructure | PostgreSQL, Redis, RabbitMQ, Docker |
| Observability | LangFuse (LLM tracing), Structured logging |
- Real-time Voice Pipeline: STT, LLM, and TTS processing in a single turn
- Multi-language: English and Polish with native voices
- Push-to-Talk + Barge-in: User controls when to speak, can interrupt anytime
- Smart Extraction: Instructor-powered structured data extraction with validation
- Confirmation Flow: Reads back collected data, handles corrections
- Template-driven: YAML templates define conversation flow and fields
voicebot/
├── src/
│ ├── python-ai/ # AI Layer (STT, LLM, TTS, WebSocket)
│ ├── dotnet-backend/ # Business API (Leads, Sessions, Webhooks)
│ └── frontend/ # Next.js voice interface
├── templates/ # Conversation templates (YAML)
├── infrastructure/ # Docker Compose, K8s configs
└── docs/ # Architecture docs
- Docker & Docker Compose
- OpenAI API key
- ElevenLabs API key
# Clone and configure
cp src/python-ai/.env.example src/python-ai/.env
# Add your API keys to .env
# Start all services
cd infrastructure/docker
docker-compose up --build
# Access
# Frontend: http://localhost:3000
# Python AI: http://localhost:8000
# .NET API: http://localhost:5000| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key for STT + LLM |
ELEVENLABS_API_KEY |
ElevenLabs API key for TTS |
LLM_MODEL |
Model to use (default: gpt-4o-mini) |
REDIS_URL |
Redis connection string |
RABBITMQ_URL |
RabbitMQ connection string |
Templates in templates/ define:
- Fields to collect (name, email, phone, etc.)
- Prompts and validation rules
- Language and voice settings
- Confirmation messages
- User speaks - Audio streamed via WebSocket
- STT (OpenAI gpt-4o-transcribe) - Text transcript
- LLM (GPT-4o-mini + Instructor) - Response + extracted fields
- TTS (ElevenLabs) - Natural voice response
- On completion - Lead created in .NET backend via RabbitMQ
- Sales Qualification: Collect lead info before human handoff
- Appointment Booking: Gather details for scheduling
- Customer Intake: Onboarding data collection
- Support Triage: Initial issue categorization
- Core voice pipeline (STT, LLM, TTS)
- Multi-language support (EN/PL)
- Push-to-talk with barge-in
- Confirmation flow with corrections
- Lead management backend
- Admin dashboard
- Phone line (PSTN) integration
- Voice Activity Detection (hands-free mode)
- Sentiment analysis
