🛒 Voice-Based E-commerce Agent

An AI-powered voice calling agent for e-commerce that helps customers find products through natural phone conversations. Built with Twilio, ElevenLabs, Groq LLM, and ChromaDB.

📖 Project Overview

This project is a Voice-based AI Shopping Assistant that allows customers to call a phone number and shop for products using natural conversation. Here's how we built it step-by-step:

🔄 Complete Pipeline

1️⃣ Data Collection

I sourced my product catalog from the Walmart Products Dataset:

📦 Dataset: Walmart Dataset Samples
Contains product names, descriptions, prices, categories, and more
Raw CSV file stored in DataCleaning/walmart-products.csv

2️⃣ Data Cleaning & Preprocessing

The raw data needed cleaning before it could be used effectively:

Removed duplicate products and null values
Normalized price formats (removed $ symbols, converted to float)
Extracted brand names from product titles
Categorized products (Laptops, Smartphones, etc.)
Output: DataCleaning/cleaned_data.csv

3️⃣ Vector Database Setup (ChromaDB)

Converted cleaned data into searchable embeddings:

Used HuggingFace Embeddings (all-MiniLM-L6-v2) to create vector representations
Stored in ChromaDB with both content and metadata
Page Content: Product name + description (for semantic search)
Metadata: Price, Brand, Category (for filtering)

This allows us to search products semantically ("show me gaming laptops") while also filtering by budget, brand, or category.

4️⃣ Retriever with Smart Filtering

Built a hybrid retriever that merges semantic dense embeddings with BM25 sparse keyword searches using Reciprocal Rank Fusion (RRF):

User says "Samsung phone under 20000"
System extracts: brand=Samsung, category=Smartphone, budget=20000
Hybrid query: Dense vectors + BM25 index + $and filters
Automatically caches query embeddings and handles case-insensitive metadata variations.

5️⃣ Session Memory

Implemented two types of memory for natural conversations:

Conversation Memory: Remembers last 10 exchanges in the call
User Preferences: Tracks budget, brand, and category mentioned by user

6️⃣ LLM Engine (Groq)

Connected everything to a fast LLM for response generation:

Uses Groq with llama-3.1-8b-instant (super fast - 500+ tokens/sec)
Takes: Retrieved products + Conversation history + User preferences + Current query
Generates: Short, voice-friendly responses (1-2 sentences)

7️⃣ Voice & Web UI Integration

Final integration for multi-modal interaction:

Twilio Voice: Handles incoming calls and speech-to-text
ElevenLabs: Converts AI responses to natural human-like voice
Chat Web UI: A sleek, dark-themed frontend (static/index.html) using the new /api/chat endpoint.
FastAPI: Backend server that orchestrates Twilio webhooks, LLM queries, and the frontend server.

Result: Customer calls or types in the web interface → interacts naturally → Gets AI response synthesized back seamlessly! 🛒

🏗 Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        VOICE E-COMMERCE AGENT                        │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│  📞 TWILIO VOICE                                                     │
│  ├── Incoming call handling                                          │
│  ├── Speech-to-Text (built-in)                                       │
│  └── Webhook endpoints                                               │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│  🧠 PROCESSING PIPELINE                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │  Retriever  │→ │   Memory    │→ │  LLM Engine │→ │   Response  │ │
│  │  (ChromaDB) │  │ (Session)   │  │   (Groq)    │  │  Generation │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│  🎙 ELEVENLABS TTS                                                   │
│  ├── Natural voice synthesis                                         │
│  ├── Turbo model (low latency)                                       │
│  └── Audio streaming to Twilio                                       │
└─────────────────────────────────────────────────────────────────────┘

✨ Features

📞 Phone-based shopping - Customers call and shop via voice
🎯 Smart product search - Semantic search with filters (brand, category, budget)
🧠 Conversation memory - Remembers context within the call
👤 User preferences - Tracks budget, brand, category preferences
🗣️ Natural voice - ElevenLabs for human-like responses
⚡ Low latency - Groq LLM (500+ tokens/sec) + ElevenLabs Turbo

🛠 Tech Stack

Component	Technology	Purpose
Voice Gateway	Twilio Voice	Phone calls, STT
Text-to-Speech	ElevenLabs	Natural voice synthesis
LLM	Groq (Llama 3.1 8B)	Response generation
Vector DB	ChromaDB	Product embeddings & search
Embeddings	HuggingFace (all-MiniLM-L6-v2)	Text embeddings
Backend	FastAPI	API server
Data Processing	Pandas	Data cleaning

🚀 Setup Guide

1. Clone & Install

git clone https://github.com/yourusername/voice_ecomm.git
cd voice_ecomm

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

2. Environment Variables

Create .env file:

# LLM
GROQ_API_KEY=your_groq_api_key

# Voice
ELEVEN_LABS=your_elevenlabs_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key  # Required for streaming voice

# Embeddings
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Twilio (voice calls + browser voice)
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_API_KEY_SID=your_twilio_api_key_sid
TWILIO_API_KEY_SECRET=your_twilio_api_key_secret
TWILIO_TWIML_APP_SID=your_twiml_app_sid

# Public base URL for Twilio callbacks / audio playback (ngrok or HTTPS domain)
PUBLIC_BASE_URL=https://your-ngrok-url.ngrok.io

3. Data Pipeline

Step 1: Data Cleaning

# Run the data cleaning notebook
jupyter notebook DataCleaning/file.ipynb

The cleaning process:

Removes duplicates and null values
Normalizes price formats
Extracts brand names
Categorizes products
Outputs cleaned_data.csv

Step 2: Ingest to ChromaDB

python Ingestion/chroma.py

This creates vector embeddings with metadata:

product_name - Product title
description - Product description
price - Numeric price (for filtering)
brand - Brand name (for filtering)
category - Product category (for filtering)

4. Run the Server

# Start FastAPI server
python app.py

5. Expose with Ngrok

# In another terminal
ngrok http 8000

6. Configure Twilio

Go to Twilio Console
Buy a phone number (or use trial)
Configure Voice webhook:
- URL: https://your-ngrok-url.ngrok.io/webhooks/voice/incoming
- Method: POST
Save and call the number!

7. Streaming Voice (Phone Calls)

If DEEPGRAM_API_KEY is set, incoming calls use Twilio Media Streams with real-time STT + streaming TTS:

Ensure your public URL supports wss:// (ngrok works).
Keep the Voice webhook pointing to:
- https://your-ngrok-url.ngrok.io/webhooks/voice/incoming
Twilio will open a WebSocket to:
- wss://your-ngrok-url.ngrok.io/ws/twilio

8. Browser Voice (Web UI, no Twilio SDK)

The web voice button now uses your browser microphone + Deepgram streaming STT + ElevenLabs streaming TTS (no Twilio SDK). It requires:

DEEPGRAM_API_KEY
PUBLIC_BASE_URL (for audio URLs and Twilio callback parity)

Open the homepage and click Start live voice.

9. Twilio Browser Calls (Optional)

Create a TwiML App in Twilio Console.
Set the TwiML App Voice Request URL to:
- https://your-ngrok-url.ngrok.io/webhooks/voice/incoming
Create a Twilio API Key (not the Auth Token).
Add these to .env:
- TWILIO_API_KEY_SID, TWILIO_API_KEY_SECRET, TWILIO_TWIML_APP_SID
Ensure PUBLIC_BASE_URL is set to the same public URL so Twilio can fetch /audio/....

Metadata stored:

Field	Type	Example	Use
`price`	float	499.99	Budget filtering (`$lte`)
`brand`	string	"Samsung"	Brand filtering
`category`	string	"Laptop"	Category filtering

Call Flow:

📞 Incoming Call
      │
      ▼
┌─────────────────┐
│  POST /         │ ──→ Welcome message (ElevenLabs)
└─────────────────┘
      │
      ▼
┌─────────────────┐
│ Twilio STT      │ ──→ User speaks, transcribed
└─────────────────┘
      │
      ▼
┌─────────────────┐
│ POST /process-  │
│ speech          │ ──→ Retrieve → LLM → ElevenLabs → Play
└─────────────────┘
      │
      ▼
   (Loop until hangup)

🔌 API Endpoints

Endpoint	Method	Description
`/`	POST	Twilio webhook - incoming call
`/incoming-call`	POST	Alias for incoming call
`/process-speech`	POST	Process user speech, return AI response
`/audio/{filename}`	GET	Serve ElevenLabs audio files
`/ws/twilio`	WebSocket	Twilio Media Streams (streaming voice)
`/ws/web-voice`	WebSocket	Browser live voice (streaming)

🧪 Testing

Text-based Testing (without calling)

python test_pipeline.py

This tests:

✅ Retriever with filters
✅ LLM response generation
✅ Memory updates
✅ Interactive chat mode

Live Call Testing

Start server: python app.py
Start ngrok: ngrok http 8000
Update Twilio webhook
Call your Twilio number

📊 Performance

Component	Latency
Twilio STT	~1s
ChromaDB Retrieval	~100ms
Groq LLM	~200ms
ElevenLabs TTS	~500ms
Total Response Time	~2s

📝 License

MIT License

🤝 Contributing

PRs welcome! Please read contributing guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
DataCleaning		DataCleaning
Ingestion		Ingestion
SessionMemory		SessionMemory
audio_cache		audio_cache
postman		postman
scripts		scripts
static		static
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
a.py		a.py
app.py		app.py
llm.py		llm.py
prompt.txt		prompt.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retriever.py		retriever.py
test_pipeline.py		test_pipeline.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🛒 Voice-Based E-commerce Agent

📖 Project Overview

🔄 Complete Pipeline

1️⃣ Data Collection

2️⃣ Data Cleaning & Preprocessing

3️⃣ Vector Database Setup (ChromaDB)

4️⃣ Retriever with Smart Filtering

5️⃣ Session Memory

6️⃣ LLM Engine (Groq)

7️⃣ Voice & Web UI Integration

🏗 Architecture

✨ Features

🛠 Tech Stack

🚀 Setup Guide

1. Clone & Install

2. Environment Variables

3. Data Pipeline

Step 1: Data Cleaning

Step 2: Ingest to ChromaDB

4. Run the Server

5. Expose with Ngrok

6. Configure Twilio

7. Streaming Voice (Phone Calls)

8. Browser Voice (Web UI, no Twilio SDK)

9. Twilio Browser Calls (Optional)

🔌 API Endpoints

🧪 Testing

Text-based Testing (without calling)

Live Call Testing

📊 Performance

📝 License

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages