MedRAG Avatar Platform - IVORY

A web application for creating and interacting with custom talking avatars powered by Azure Cognitive Services and RAG (Retrieval Augmented Generation) technology.

Overview

This project allows users to create and interact with AI-powered talking avatars. Users can upload their own knowledge base documents (PDF, TXT), select avatar appearances, customize backgrounds, and define prompts to create personalized conversational agents. The avatars use speech synthesis and natural language processing to provide dynamic, informative responses based on the uploaded knowledge base.

Youtube Demo: https://youtu.be/tZ5aoUfyKgM

Features

Custom Avatar Creation: Create personalized avatars with uploaded knowledge base
Document Processing: Support for PDF and TXT files
Real-time Avatar Interaction: Web-based interface for conversing with avatars
Azure Cognitive Services Integration: Text-to-speech and talking avatar capabilities
Retrieval Augmented Generation (RAG): Uses Azure Cosmos DB and vector search for knowledge retrieval
WebRTC Streaming: Real-time audio and video streaming for avatar interactions
Responsive Design: Works across different device sizes

Technology Stack

Backend:
- Flask (Python)
- Azure OpenAI for text generation
- Azure Cosmos DB for document storage and vector search
- Pinecone for vector indexing
- Azure Speech Services for TTS
Frontend:
- HTML/CSS/JavaScript
- WebRTC for real-time communication
- Azure Speech SDK for browser integration
Storage:
- Azure Cosmos DB for document storage
- Pinecone for vector embeddings
- Cloudflare R2 for chat history storage

Getting Started

Prerequisites

Python 3.8+
Azure account with:
- Azure OpenAI API access
- Azure Cognitive Services (Speech)
- Azure Cosmos DB
Pinecone account
Cloudflare R2 storage (optional, for chat history)

Environment Variables

Create a .env file with the following variables:

AZURE_OPENAI_VARE_KEY=your_azure_openai_key
AZURE_ENDPOINT=your_azure_endpoint
PINECONE_API_KEY=your_pinecone_key
PINECONE_API_KEY2=your_second_pinecone_key
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_SPEECH_REGION=eastus
COSMOS_HOST=your_cosmos_db_host
COSMOS_KEY=your_cosmos_db_key

Installation

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```

Usage

Creating a Custom Avatar

Navigate to the avatar creation page
Enter avatar name and description
Upload knowledge base documents (PDF, TXT)
Select or generate a QA prompt template
Choose an avatar character and background
Click "Create Avatar"

Interacting with Avatars

Navigate to the avatar gallery page
Select an avatar to start a conversation
Ask questions via text input or click suggested follow-up questions
The avatar will respond with synthesized speech and animation
Chat history can be saved for future reference

Project Structure

app.py: Main Flask application
avatar-conv.js: JavaScript for avatar conversation interface
avatar-conv.html: HTML template for avatar conversation
avatar-page.html: Gallery of available avatars
upload.html: Avatar creation interface
static/: Static files (images, CSS, JS)
templates/: HTML templates

Features in Detail

RAG Implementation

The system uses Retrieval Augmented Generation with:

Document chunking and embedding via Azure OpenAI embeddings
Storage in Cosmos DB with vector capabilities
Query-time retrieval based on semantic similarity
Response generation incorporating retrieved knowledge

Avatar Synthesis

Avatars are synthesized using:

Azure Speech SDK for text-to-speech
Azure Talking Avatar service for facial animation
WebRTC for real-time streaming to the browser

Customization Options

Multiple avatar characters (Dr. David Avenetti, Prof. Zalake, Lisa-Casual, Max-Business)
Background selection with various themes
Prompt customization with AI-assisted generation

License

MIT License

Acknowledgments

Azure Cognitive Services Team
Microsoft Azure OpenAI Service
Pinecone Vector Database

This project demonstrates integration of multiple Azure services to create interactive, knowledge-grounded conversational avatars for various applications including education, customer service, and information delivery.

How to activate (for developers at V-ARE Lab)

Due to high cost of involved models and services, we have paused/stopped certain necessary components on our Azure App Services. The following is how to activate all paused components:

Log in Micorosoft Azure home.
Open the Web app 'VARELabUICivory' and start the website, if its paused.
Open the Azure cognitive serive 'secondTestFriday' and type in Pricing tier.
In the pricing tier page, change the pricing to 'S0 Standard' and click apply.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
.vscode		.vscode
SpeechSDK-JavaScript-1.36.0		SpeechSDK-JavaScript-1.36.0
static		static
templates		templates
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
favicon.ico		favicon.ico
gunicorn.conf.py		gunicorn.conf.py
image_2025-03-17_181028816.png		image_2025-03-17_181028816.png
image_2025-03-17_181156475.png		image_2025-03-17_181156475.png
image_2025-03-17_181335080.png		image_2025-03-17_181335080.png
microsoft.cognitiveservices.speech.sdk.bundle-min.js		microsoft.cognitiveservices.speech.sdk.bundle-min.js
microsoft.cognitiveservices.speech.sdk.bundle.d.ts		microsoft.cognitiveservices.speech.sdk.bundle.d.ts
microsoft.cognitiveservices.speech.sdk.bundle.js		microsoft.cognitiveservices.speech.sdk.bundle.js
microsoft.cognitiveservices.speech.sdk.bundle.js.map		microsoft.cognitiveservices.speech.sdk.bundle.js.map
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedRAG Avatar Platform - IVORY

Overview

Features

Technology Stack