A web application for creating and interacting with custom talking avatars powered by Azure Cognitive Services and RAG (Retrieval Augmented Generation) technology.
This project allows users to create and interact with AI-powered talking avatars. Users can upload their own knowledge base documents (PDF, TXT), select avatar appearances, customize backgrounds, and define prompts to create personalized conversational agents. The avatars use speech synthesis and natural language processing to provide dynamic, informative responses based on the uploaded knowledge base.
Youtube Demo: https://youtu.be/tZ5aoUfyKgM
- Custom Avatar Creation: Create personalized avatars with uploaded knowledge base
- Document Processing: Support for PDF and TXT files
- Real-time Avatar Interaction: Web-based interface for conversing with avatars
- Azure Cognitive Services Integration: Text-to-speech and talking avatar capabilities
- Retrieval Augmented Generation (RAG): Uses Azure Cosmos DB and vector search for knowledge retrieval
- WebRTC Streaming: Real-time audio and video streaming for avatar interactions
- Responsive Design: Works across different device sizes
-
Backend:
- Flask (Python)
- Azure OpenAI for text generation
- Azure Cosmos DB for document storage and vector search
- Pinecone for vector indexing
- Azure Speech Services for TTS
-
Frontend:
- HTML/CSS/JavaScript
- WebRTC for real-time communication
- Azure Speech SDK for browser integration
-
Storage:
- Azure Cosmos DB for document storage
- Pinecone for vector embeddings
- Cloudflare R2 for chat history storage
- Python 3.8+
- Azure account with:
- Azure OpenAI API access
- Azure Cognitive Services (Speech)
- Azure Cosmos DB
- Pinecone account
- Cloudflare R2 storage (optional, for chat history)
Create a .env file with the following variables:
AZURE_OPENAI_VARE_KEY=your_azure_openai_key
AZURE_ENDPOINT=your_azure_endpoint
PINECONE_API_KEY=your_pinecone_key
PINECONE_API_KEY2=your_second_pinecone_key
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=eastus
COSMOS_HOST=your_cosmos_db_host
COSMOS_KEY=your_cosmos_db_key
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Run the application:
python app.py
- Navigate to the avatar creation page
- Enter avatar name and description
- Upload knowledge base documents (PDF, TXT)
- Select or generate a QA prompt template
- Choose an avatar character and background
- Click "Create Avatar"
- Navigate to the avatar gallery page
- Select an avatar to start a conversation
- Ask questions via text input or click suggested follow-up questions
- The avatar will respond with synthesized speech and animation
- Chat history can be saved for future reference
app.py: Main Flask applicationavatar-conv.js: JavaScript for avatar conversation interfaceavatar-conv.html: HTML template for avatar conversationavatar-page.html: Gallery of available avatarsupload.html: Avatar creation interfacestatic/: Static files (images, CSS, JS)templates/: HTML templates
The system uses Retrieval Augmented Generation with:
- Document chunking and embedding via Azure OpenAI embeddings
- Storage in Cosmos DB with vector capabilities
- Query-time retrieval based on semantic similarity
- Response generation incorporating retrieved knowledge
Avatars are synthesized using:
- Azure Speech SDK for text-to-speech
- Azure Talking Avatar service for facial animation
- WebRTC for real-time streaming to the browser
- Multiple avatar characters (Dr. David Avenetti, Prof. Zalake, Lisa-Casual, Max-Business)
- Background selection with various themes
- Prompt customization with AI-assisted generation
MIT License
- Azure Cognitive Services Team
- Microsoft Azure OpenAI Service
- Pinecone Vector Database
This project demonstrates integration of multiple Azure services to create interactive, knowledge-grounded conversational avatars for various applications including education, customer service, and information delivery.
Due to high cost of involved models and services, we have paused/stopped certain necessary components on our Azure App Services. The following is how to activate all paused components: