Echo-Persona: Digital Twin AI Platform

A containerized AI platform that creates and interacts with digital twin personas using RAG (Retrieval-Augmented Generation), local speech-to-text transcription, voice cloning, and multi-language support.

Features

Core Capabilities

AI Personas: Create and interact with customizable AI personas with distinct personalities
Local Speech-to-Text: Offline Whisper-based transcription (English + Urdu)
Voice Cloning: Clone voices and generate speech using AllVoiceLab API
RAG System: Retrieval-Augmented Generation for context-aware responses
Agentic AI Models: Two intelligent agents for advanced document research and conversational planning
Multi-turn Chat: Maintain conversation history and context
Multi-language Support: English and Urdu language detection and processing
Document Management: Upload and ingest documents for RAG
Text-to-Speech: Generate speech with cloned voices

Technical Stack

Backend: FastAPI (Python)
Frontend: Streamlit
Speech Processing: OpenAI Whisper (base model, CPU-optimized)
Voice Cloning: AllVoiceLab API
Vector Store: ChromaDB
LLM Integration: Groq API (llama-3.3-70b-versatile)
Database: SQLite
Containerization: Docker & Docker Compose
Audio Capture: PyAudio with PortAudio

Project Structure

Echo-Persona/
├── echo/                           # Main application
│   ├── app/
│   │   ├── main.py                # FastAPI entry point
│   │   ├── api/                   # API route handlers
│   │   │   ├── chat.py            # Chat endpoints
│   │   │   ├── documents.py       # Document management
│   │   │   ├── personas.py        # Persona CRUD operations
│   │   │   ├── speech.py          # Speech-to-text endpoints
│   │   │   ├── voice.py           # Voice cloning endpoints
│   │   │   └── agents.py         # Agentic AI endpoints
│   │   ├── agents/                # Agentic AI models
│   │   │   ├── base_agent.py      # Base agent class
│   │   │   ├── tools.py           # Agent tools
│   │   │   ├── document_research_agent.py  # Document research agent
│   │   │   └── conversational_planning_agent.py  # Conversational planning agent
│   │   ├── core/                  # Core utilities
│   │   │   ├── config.py          # Configuration management
│   │   │   └── logging.py         # Logging setup
│   │   ├── db/                    # Database layer
│   │   │   ├── database.py        # DB connection & session
│   │   │   ├── models.py          # SQLAlchemy models (includes VoiceClone)
│   │   │   └── init_db.py         # Database initialization
│   │   ├── models/
│   │   │   └── schemas.py         # Pydantic models
│   │   ├── rag/                   # RAG pipeline
│   │   │   ├── embeddings.py      # Embedding generation
│   │   │   ├── generation.py      # LLM response generation
│   │   │   ├── ingestion.py       # Document ingestion
│   │   │   ├── pipeline.py        # RAG pipeline orchestration
│   │   │   ├── retrieval.py       # Vector store retrieval
│   │   │   └── vectorstore.py     # ChromaDB integration
│   │   ├── speech/                # Speech processing
│   │   │   ├── transcriber.py     # Whisper transcriber
│   │   │   ├── audio_capture.py   # Microphone audio capture
│   │   │   └── streaming.py       # Real-time streaming
│   │   └── voice/                 # Voice cloning
│   │       └── allvoicelab_client.py  # AllVoiceLab API client
│   ├── frontend/                  # Streamlit UI
│   │   ├── app.py                 # Main Streamlit app
│   │   ├── speech_input.py        # Speech input component
│   │   └── voice_cloning.py       # Voice cloning component
│   ├── tests/                     # Unit tests
│   ├── docker-compose.yml         # Container orchestration
│   ├── Dockerfile                 # Multi-stage build
│   ├── requirements.txt           # Python dependencies
│   ├── pytest.ini                 # Pytest configuration
│   ├── demo_voice.py              # Voice cloning demo script
│   └── .env.example               # Environment variables template
├── data/
│   └── chroma/                    # ChromaDB vector store (persistent)
├── logs/                          # Application logs
├── README.md                      # Main documentation
├── VOICE_CLONING_GUIDE.md         # Detailed voice cloning guide
└── VOICE_CLONING_SUMMARY.md       # Voice cloning implementation details

Architecture Diagram

Quick Start

Prerequisites

Docker & Docker Compose (v2.0+)
4GB+ RAM available
10GB+ disk space (for models and containers)
AllVoiceLab API key (for voice cloning)
Environment variables configured (see below)

Setup

Clone the repository

git clone <repository-url>
cd Echo-Persona

Configure environment variables

Create a .env file in the echo/ directory:

# LLM Configuration
LLM_PROVIDER=groq
GROQ_MODEL=llama-3.3-70b-versatile
GROQ_API_KEY=your_groq_api_key_here

# Optional: OpenAI configuration (if using OpenAI instead of Groq)
OPENAI_API_KEY=your_openai_key_here

# Voice Cloning (AllVoiceLab)
ALLVOICELAB_API_KEY=your_allvoicelab_api_key_here

# Google Search API (for document retrieval)
GOOGLE_API_KEY=your_google_api_key_here

# Hugging Face API (for embeddings)
HUGGINGFACE_API_KEY=your_hf_api_key_here

Start the application

cd echo/
docker-compose up -d

The application will be available at:

Frontend (Streamlit): http://localhost:8501
API Documentation: http://localhost:8081/docs
API: http://localhost:8081

Access the application

Open http://localhost:8501 in your browser
Create a persona
Start chatting or clone a voice!

Usage Guide

Creating a Persona

Go to Create page
Fill in persona details:
- Name: Persona identifier
- Description: What they are known for
- Personality Traits: Key characteristics
- Speaking Style: How they communicate
- Background: Biography and context
- Knowledge Base: Upload documents or provide text
Click "Create Persona"

Chatting with Personas

Select a persona from dropdown
Choose input method:
- Text: Type your message
- Voice: Record audio (English or Urdu)
Press Enter or click Send
The AI will respond with context from the knowledge base
Use the Stop button to interrupt generation

Voice Cloning

Getting Started

Get API Key
- Visit: https://allvoicelab.com
- Sign up and get your free API key
Configure
- Add ALLVOICELAB_API_KEY=your_api_key_here to .env
Test
```
cd echo
python demo_voice.py
```

Using Voice Cloning in Frontend

Go to the Voice Cloning page
Upload voice sample: Provide a 10-30 second clear audio file (WAV, MP3, M4A, OGG)
Clone voice: Enter voice name and click "Clone Voice"
Generate speech:
- Enter text you want to speak
- Select the cloned voice
- Adjust settings (speed, stability, similarity)
- Click "Generate Speech"
Listen & Download: Play audio in browser or download MP3/WAV

Voice Cloning Features

Multiple voice samples per persona
Adjustable speed (0.5 - 2.0x)
Stability slider (0 - 1)
Similarity enhancement
Multi-format output (MP3, WAV)
In-browser playback
Download generated audio

Document Management

Go to the Documents page
Upload PDFs or text files
Documents are automatically ingested into ChromaDB
Context is retrieved during conversations

Speech Features

Microphone Recording: Click to start/stop recording
Language Selection: Auto-detect or select English/Urdu
Transcription: Click "Transcribe" to convert speech to text
Real-time Display: See transcribed text with confidence score

Agentic AI Models

Echo now includes two powerful agentic AI models that use tools and multi-step reasoning:

1. Document Research Agent

Intelligently researches and synthesizes information from documents using:

Multi-step research: Plans research strategy and executes multiple iterations
Multi-query search: Uses multiple related queries for comprehensive coverage
Content analysis: Analyzes document content for specific information
Synthesis: Combines information from multiple sources into coherent answers

Use Cases:

Complex questions requiring information from multiple documents
Research tasks that need thorough investigation
Questions that benefit from multiple search angles

API Endpoint: POST /api/agents/document-research

Example Request:

{
  "persona_id": 1,
  "message": "What are the main themes in my documents about machine learning?",
  "session_id": "optional_session_id"
}

2. Conversational Planning Agent

Plans multi-step conversations and uses tools strategically:

Conversation planning: Analyzes context and plans responses
Intelligent tool selection: Decides when to search for information
Context-aware responses: Maintains natural conversation flow
Persona voice: Responds in the persona's authentic style

Use Cases:

Natural conversations that may need document lookups
Questions requiring context from previous messages
Maintaining persona personality while accessing knowledge base

API Endpoint: POST /api/agents/conversational-planning

Example Request:

{
  "persona_id": 1,
  "message": "Tell me about my favorite hobbies",
  "session_id": "conversation_session_123"
}

Agent Capabilities

Both agents feature:

Tool-based execution: Use document search and analysis tools
Multi-step reasoning: Plan and execute complex tasks
Transparent reasoning: Provide step-by-step reasoning logs
Error handling: Gracefully handle failures
Performance tracking: Execution time and tool usage metrics

Using Agents via API

# Document Research Agent
curl -X POST "http://localhost:8080/api/agents/document-research" \
  -H "Content-Type: application/json" \
  -d '{
    "persona_id": 1,
    "message": "What are the key points about AI in my documents?"
  }'

# Conversational Planning Agent
curl -X POST "http://localhost:8080/api/agents/conversational-planning" \
  -H "Content-Type: application/json" \
  -d '{
    "persona_id": 1,
    "message": "What did I say about my career goals?",
    "session_id": "session_123"
  }'

# List available agents
curl "http://localhost:8080/api/agents/available"

Docker Deployment

Build Images

cd echo/
docker-compose build

Image Sizes (CPU-optimized):

echo-api: ~3.3GB
echo-frontend: ~3.3GB
echo-speech-input: ~3.3GB

Run Services

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Services

API Service (port 8081)
- FastAPI backend
- Health checks every 30s
- Uvicorn ASGI server
Frontend Service (port 8501)
- Streamlit web interface
- Depends on API health
Speech Input Service (port 5000)
- Optional microphone capture
- Real-time audio streaming

API Endpoints

Health

GET /health - Application health status

Personas

GET /api/personas - List all personas
POST /api/personas - Create new persona
GET /api/personas/{id} - Get persona details
PUT /api/personas/{id} - Update persona
DELETE /api/personas/{id} - Delete persona

Chat

POST /api/chat/message - Send message to persona
GET /api/chat/history - Get conversation history
DELETE /api/chat/session - Clear session

Speech-to-Text

GET /api/speech/languages - Supported languages
POST /api/speech/transcribe/file - Transcribe uploaded audio
POST /api/speech/transcribe - Transcribe base64 audio
POST /api/speech/detect-language - Detect audio language
WS /api/speech/ws/transcribe - WebSocket real-time transcription

Voice Cloning

POST /api/voice/clone - Clone voice from audio sample
POST /api/voice/tts - Generate speech with cloned voice
GET /api/voice/personas/{id}/voices - List cloned voices for persona
DELETE /api/voice/voices/{id} - Delete cloned voice
GET /api/voice/health - Voice service health check

Documents

POST /api/documents/upload - Upload document
GET /api/documents - List documents
DELETE /api/documents/{id} - Delete document

Agentic AI Models

POST /api/agents/document-research - Use Document Research Agent for intelligent document research
POST /api/agents/conversational-planning - Use Conversational Planning Agent for context-aware conversations
GET /api/agents/available - List all available agentic AI models

API Documentation

Interactive Swagger UI: http://localhost:8081/docs
ReDoc: http://localhost:8081/redoc

Development

Local Development Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run API locally
cd echo
uvicorn app.main:app --reload

# In another terminal, run Streamlit
streamlit run frontend/app.py

Running Tests

cd echo/
pytest
pytest --cov=app tests/  # With coverage

Voice Cloning Demo

cd echo/
python demo_voice.py

Key Files

requirements.txt: All Python dependencies
Dockerfile: Multi-stage build for optimized images
docker-compose.yml: Service orchestration
app/core/config.py: Configuration management
app/rag/pipeline.py: RAG orchestration
app/voice/allvoicelab_client.py: Voice cloning client
app/agents/: Agentic AI models (Document Research, Conversational Planning)
app/agents/tools.py: Reusable agent tools
frontend/voice_cloning.py: Voice UI component

Pushing to Docker Hub

Prerequisites

Docker Hub account (free at https://hub.docker.com)
Docker login configured

Steps

Login to Docker Hub

docker login
# Enter your Docker Hub username and password

Tag images (replace <username> with your Docker Hub username)

cd echo/

# Tag API image
docker tag echo-api <username>/echo-persona-api:latest
docker tag echo-api <username>/echo-persona-api:1.0.0

# Tag Frontend image
docker tag echo-frontend <username>/echo-persona-frontend:latest
docker tag echo-frontend <username>/echo-persona-frontend:1.0.0

# Tag Speech service image
docker tag echo-speech-input <username>/echo-persona-speech:latest
docker tag echo-speech-input <username>/echo-persona-speech:1.0.0

Push to Docker Hub

# Push API
docker push <username>/echo-persona-api:latest
docker push <username>/echo-persona-api:1.0.0

# Push Frontend
docker push <username>/echo-persona-frontend:latest
docker push <username>/echo-persona-frontend:1.0.0

# Push Speech service
docker push <username>/echo-persona-speech:latest
docker push <username>/echo-persona-speech:1.0.0

Verify on Docker Hub

Go to https://hub.docker.com/r//
Your images should be listed as public repositories

Pulling Images for Team

Your team members can pull and run the images:

# Pull images
docker pull <username>/echo-persona-api:latest
docker pull <username>/echo-persona-frontend:latest
docker pull <username>/echo-persona-speech:latest

# Create docker-compose.yml with pulled images
# (modify the image names in docker-compose.yml to point to your Docker Hub repos)

# Run the application
docker-compose up -d

Model Information

Whisper Model

Size: Base model (~139MB)
Languages: 99 languages including English & Urdu
Accuracy: ~80-90% depending on audio quality
Speed: CPU ~30-60 seconds per minute of audio

LLM

Provider: Groq API (free tier available)
Model: llama-3.3-70b-versatile
Context Window: 8K tokens
Response Time: ~2-5 seconds (via Groq API)

Voice Cloning

Provider: AllVoiceLab
Voice Quality: High-quality natural speech
Audio Sample Required: 10-30 seconds of clear audio
Supported Formats: WAV, MP3, M4A, OGG
Output Formats: MP3, WAV
Processing Time: ~5-15 seconds per request

Embeddings

Model: Hugging Face sentence-transformers
Dimension: 384-768 dimensions
Vector Store: ChromaDB with persistent storage

Security Considerations

API Keys

Store in .env file (not committed to git)
Use environment variables in production
Rotate keys regularly
Keep AllVoiceLab API key confidential

Database

SQLite used for development (not production-ready)
For production, migrate to PostgreSQL

Authentication

Currently no authentication (add as needed)
For production, implement JWT or OAuth2

Voice Cloning

Ensure compliance with voice cloning regulations
Get consent before cloning someone's voice
Use for authorized purposes only

Performance Optimization

Current Optimizations

CPU-only PyTorch (reduced size from ~8.5GB to ~3.3GB)
Multi-stage Docker builds
Whisper base model (fastest among Whisper variants)
ChromaDB in-memory caching
AllVoiceLab cloud processing for voice cloning

For Production

Use PostgreSQL instead of SQLite
Add Redis caching layer
Implement request rate limiting
Add API authentication
Use GPU for faster transcription/inference
Cache generated voice files

Troubleshooting

Frontend won't load

# Restart frontend container
docker-compose restart frontend

# Check logs
docker-compose logs frontend

Transcription timeout

Increase timeout in frontend/speech_input.py (currently 180s)
Whisper model takes time to load on first use
Subsequent requests are faster

API connection error

# Check API health
curl http://localhost:8081/health

# Check if API is running
docker-compose ps

Voice cloning fails

Check AllVoiceLab API key in .env
Use 10-30 second clear audio samples
Supported formats: WAV, MP3, M4A, OGG
Verify audio quality (minimize background noise)

"API key not configured" error

Set ALLVOICELAB_API_KEY in .env file
Restart backend: docker-compose restart
Verify key is active on AllVoiceLab website

Out of memory

Reduce Whisper model size or use tiny model
Close other applications
Increase Docker memory allocation
Voice cloning uses cloud processing (minimal local memory impact)

Environment Variables Reference

# LLM Provider (groq or openai)
LLM_PROVIDER=groq

# Groq Configuration
GROQ_MODEL=llama-3.3-70b-versatile
GROQ_API_KEY=gsk_xxxxxxxxxxxxx

# OpenAI Configuration (optional)
OPENAI_API_KEY=sk-xxxxxxxxxxxxx

# Voice Cloning (AllVoiceLab)
ALLVOICELAB_API_KEY=your_api_key_here

# Google API (for search)
GOOGLE_API_KEY=xxxxxxxxxxxxx

# Hugging Face (for embeddings)
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxx

# Database paths (in containers)
SQLITE_DATABASE_PATH=/app/data/echo.db
CHROMA_PERSIST_DIRECTORY=/app/data/chroma
UPLOAD_DIRECTORY=/app/data/uploads

Contributing

Create a feature branch: git checkout -b feature/feature-name
Commit changes: git commit -am 'Add feature'
Push to branch: git push origin feature/feature-name
Submit pull request

Support

For issues or questions:

Check the troubleshooting section
Review API documentation at http://localhost:8081/docs
Check container logs: docker-compose logs
For voice cloning issues, see VOICE_CLONING_GUIDE.md

Version History

v1.2.0 (Current - December 2025)

Agentic AI Models: Two intelligent agents for document research and conversational planning
Document Research Agent: Multi-step research with tool-based execution
Conversational Planning Agent: Context-aware conversations with intelligent tool selection
Agent Tools: Reusable tools for document search, analysis, and multi-query search
Agent API Endpoints: RESTful API for agent execution
Transparent Reasoning: Step-by-step reasoning logs from agents

v1.1.0 (December 8, 2025)

Voice cloning with AllVoiceLab API
Text-to-speech with cloned voices
Voice management per persona
Multiple output formats (MP3, WAV)
Adjustable speech parameters (speed, stability, similarity)
Audio file download functionality

v1.0.0

Basic persona creation and chat
Local Whisper speech-to-text
RAG with ChromaDB
Multi-language support (English, Urdu)
Docker containerization
Streamlit UI with stop button
CPU-optimized images (~3.3GB each)

Last Updated: December 8, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AD		AD
Guide		Guide
data		data
echo		echo
logs		logs
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf

Folders and files

Latest commit

History

Repository files navigation

Echo-Persona: Digital Twin AI Platform

Features

Core Capabilities

Technical Stack

Project Structure

Architecture Diagram

Quick Start

Prerequisites

Setup

Usage Guide

Creating a Persona

Chatting with Personas

Voice Cloning

Getting Started

Using Voice Cloning in Frontend

Voice Cloning Features

Document Management

Speech Features

Agentic AI Models

1. Document Research Agent

2. Conversational Planning Agent

Agent Capabilities

Using Agents via API

Docker Deployment

Build Images

Run Services

Services

API Endpoints

Health

Personas

Chat

Speech-to-Text

Voice Cloning

Documents

Agentic AI Models

API Documentation

Development

Local Development Setup

Running Tests

Voice Cloning Demo

Key Files

Pushing to Docker Hub

Prerequisites

Steps

Verify on Docker Hub

Pulling Images for Team

Model Information

Whisper Model

LLM

Voice Cloning

Embeddings

Security Considerations

API Keys

Database

Authentication

Voice Cloning

Performance Optimization

Current Optimizations

For Production

Troubleshooting

Frontend won't load

Transcription timeout

API connection error

Voice cloning fails

"API key not configured" error

Out of memory

Environment Variables Reference

Contributing

Support

Version History

v1.2.0 (Current - December 2025)

v1.1.0 (December 8, 2025)

v1.0.0

About

Resources

Uh oh!

Packages