Skip to content

lukuch/VoiceBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceBot

AI-powered voice intake system that automates client qualification through natural conversation. Built with a hybrid Python + .NET architecture for enterprise scalability.

Live Demo: https://voicebot-frontend-hnha.onrender.com (may take ~30s to wake up on first load)

VoiceBot Interface

What It Does

VoiceBot conducts real-time voice conversations to collect structured client information (name, email, phone, project details) before human sales involvement. Think of it as an AI receptionist that:

  • Speaks naturally in English and Polish
  • Extracts and validates data in real-time
  • Handles interruptions and corrections gracefully
  • Stores structured leads in the backend database

Architecture

flowchart LR
    FE[Next.js Frontend] <-->|WebSocket Audio/JSON| PY[Python AI FastAPI]
    PY <-->|RabbitMQ Events| NET[.NET Backend ASP.NET]
    PY --> Redis[(Redis Sessions)]
    NET --> PG[(PostgreSQL Leads)]
Loading

Why Hybrid?

  • Python excels at AI/ML (STT, LLM, TTS) with rich ecosystem
  • .NET provides enterprise-grade APIs, auth, and data management
  • Best of both worlds without compromise

Tech Stack

Layer Technology
Frontend Next.js 16, TypeScript, Tailwind, shadcn/ui
AI Layer Python 3.13, FastAPI, OpenAI (gpt-4o-transcribe, gpt-4o-mini), ElevenLabs TTS
Backend .NET 10, ASP.NET Core, EF Core, MassTransit
Infrastructure PostgreSQL, Redis, RabbitMQ, Docker
Observability LangFuse (LLM tracing), Structured logging

Key Features

  • Real-time Voice Pipeline: STT, LLM, and TTS processing in a single turn
  • Multi-language: English and Polish with native voices
  • Push-to-Talk + Barge-in: User controls when to speak, can interrupt anytime
  • Smart Extraction: Instructor-powered structured data extraction with validation
  • Confirmation Flow: Reads back collected data, handles corrections
  • Template-driven: YAML templates define conversation flow and fields

Project Structure

voicebot/
├── src/
│   ├── python-ai/       # AI Layer (STT, LLM, TTS, WebSocket)
│   ├── dotnet-backend/  # Business API (Leads, Sessions, Webhooks)
│   └── frontend/        # Next.js voice interface
├── templates/           # Conversation templates (YAML)
├── infrastructure/      # Docker Compose, K8s configs
└── docs/               # Architecture docs

Quick Start

Prerequisites

  • Docker & Docker Compose
  • OpenAI API key
  • ElevenLabs API key

Run Locally

# Clone and configure
cp src/python-ai/.env.example src/python-ai/.env
# Add your API keys to .env

# Start all services
cd infrastructure/docker
docker-compose up --build

# Access
# Frontend: http://localhost:3000
# Python AI: http://localhost:8000
# .NET API: http://localhost:5000

Configuration

Environment Variables (Python AI)

Variable Description
OPENAI_API_KEY OpenAI API key for STT + LLM
ELEVENLABS_API_KEY ElevenLabs API key for TTS
LLM_MODEL Model to use (default: gpt-4o-mini)
REDIS_URL Redis connection string
RABBITMQ_URL RabbitMQ connection string

Conversation Templates

Templates in templates/ define:

  • Fields to collect (name, email, phone, etc.)
  • Prompts and validation rules
  • Language and voice settings
  • Confirmation messages

How It Works

  1. User speaks - Audio streamed via WebSocket
  2. STT (OpenAI gpt-4o-transcribe) - Text transcript
  3. LLM (GPT-4o-mini + Instructor) - Response + extracted fields
  4. TTS (ElevenLabs) - Natural voice response
  5. On completion - Lead created in .NET backend via RabbitMQ

Use Cases

  • Sales Qualification: Collect lead info before human handoff
  • Appointment Booking: Gather details for scheduling
  • Customer Intake: Onboarding data collection
  • Support Triage: Initial issue categorization

Roadmap

  • Core voice pipeline (STT, LLM, TTS)
  • Multi-language support (EN/PL)
  • Push-to-talk with barge-in
  • Confirmation flow with corrections
  • Lead management backend
  • Admin dashboard
  • Phone line (PSTN) integration
  • Voice Activity Detection (hands-free mode)
  • Sentiment analysis