Skip to content

mmujtaba0085/EchoPersona

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Echo-Persona: Digital Twin AI Platform

A containerized AI platform that creates and interacts with digital twin personas using RAG (Retrieval-Augmented Generation), local speech-to-text transcription, voice cloning, and multi-language support.

Features

Core Capabilities

  • AI Personas: Create and interact with customizable AI personas with distinct personalities
  • Local Speech-to-Text: Offline Whisper-based transcription (English + Urdu)
  • Voice Cloning: Clone voices and generate speech using AllVoiceLab API
  • RAG System: Retrieval-Augmented Generation for context-aware responses
  • Agentic AI Models: Two intelligent agents for advanced document research and conversational planning
  • Multi-turn Chat: Maintain conversation history and context
  • Multi-language Support: English and Urdu language detection and processing
  • Document Management: Upload and ingest documents for RAG
  • Text-to-Speech: Generate speech with cloned voices

Technical Stack

  • Backend: FastAPI (Python)
  • Frontend: Streamlit
  • Speech Processing: OpenAI Whisper (base model, CPU-optimized)
  • Voice Cloning: AllVoiceLab API
  • Vector Store: ChromaDB
  • LLM Integration: Groq API (llama-3.3-70b-versatile)
  • Database: SQLite
  • Containerization: Docker & Docker Compose
  • Audio Capture: PyAudio with PortAudio

Project Structure

Echo-Persona/
├── echo/                           # Main application
│   ├── app/
│   │   ├── main.py                # FastAPI entry point
│   │   ├── api/                   # API route handlers
│   │   │   ├── chat.py            # Chat endpoints
│   │   │   ├── documents.py       # Document management
│   │   │   ├── personas.py        # Persona CRUD operations
│   │   │   ├── speech.py          # Speech-to-text endpoints
│   │   │   ├── voice.py           # Voice cloning endpoints
│   │   │   └── agents.py         # Agentic AI endpoints
│   │   ├── agents/                # Agentic AI models
│   │   │   ├── base_agent.py      # Base agent class
│   │   │   ├── tools.py           # Agent tools
│   │   │   ├── document_research_agent.py  # Document research agent
│   │   │   └── conversational_planning_agent.py  # Conversational planning agent
│   │   ├── core/                  # Core utilities
│   │   │   ├── config.py          # Configuration management
│   │   │   └── logging.py         # Logging setup
│   │   ├── db/                    # Database layer
│   │   │   ├── database.py        # DB connection & session
│   │   │   ├── models.py          # SQLAlchemy models (includes VoiceClone)
│   │   │   └── init_db.py         # Database initialization
│   │   ├── models/
│   │   │   └── schemas.py         # Pydantic models
│   │   ├── rag/                   # RAG pipeline
│   │   │   ├── embeddings.py      # Embedding generation
│   │   │   ├── generation.py      # LLM response generation
│   │   │   ├── ingestion.py       # Document ingestion
│   │   │   ├── pipeline.py        # RAG pipeline orchestration
│   │   │   ├── retrieval.py       # Vector store retrieval
│   │   │   └── vectorstore.py     # ChromaDB integration
│   │   ├── speech/                # Speech processing
│   │   │   ├── transcriber.py     # Whisper transcriber
│   │   │   ├── audio_capture.py   # Microphone audio capture
│   │   │   └── streaming.py       # Real-time streaming
│   │   └── voice/                 # Voice cloning
│   │       └── allvoicelab_client.py  # AllVoiceLab API client
│   ├── frontend/                  # Streamlit UI
│   │   ├── app.py                 # Main Streamlit app
│   │   ├── speech_input.py        # Speech input component
│   │   └── voice_cloning.py       # Voice cloning component
│   ├── tests/                     # Unit tests
│   ├── docker-compose.yml         # Container orchestration
│   ├── Dockerfile                 # Multi-stage build
│   ├── requirements.txt           # Python dependencies
│   ├── pytest.ini                 # Pytest configuration
│   ├── demo_voice.py              # Voice cloning demo script
│   └── .env.example               # Environment variables template
├── data/
│   └── chroma/                    # ChromaDB vector store (persistent)
├── logs/                          # Application logs
├── README.md                      # Main documentation
├── VOICE_CLONING_GUIDE.md         # Detailed voice cloning guide
└── VOICE_CLONING_SUMMARY.md       # Voice cloning implementation details

Architecture Diagram

Echo-Persona Architecture Diagram

Quick Start

Prerequisites

  • Docker & Docker Compose (v2.0+)
  • 4GB+ RAM available
  • 10GB+ disk space (for models and containers)
  • AllVoiceLab API key (for voice cloning)
  • Environment variables configured (see below)

Setup

  1. Clone the repository
git clone <repository-url>
cd Echo-Persona
  1. Configure environment variables

Create a .env file in the echo/ directory:

# LLM Configuration
LLM_PROVIDER=groq
GROQ_MODEL=llama-3.3-70b-versatile
GROQ_API_KEY=your_groq_api_key_here

# Optional: OpenAI configuration (if using OpenAI instead of Groq)
OPENAI_API_KEY=your_openai_key_here

# Voice Cloning (AllVoiceLab)
ALLVOICELAB_API_KEY=your_allvoicelab_api_key_here

# Google Search API (for document retrieval)
GOOGLE_API_KEY=your_google_api_key_here

# Hugging Face API (for embeddings)
HUGGINGFACE_API_KEY=your_hf_api_key_here
  1. Start the application
cd echo/
docker-compose up -d

The application will be available at:

  1. Access the application

Usage Guide

Creating a Persona

  1. Go to Create page

  2. Fill in persona details:

    • Name: Persona identifier
    • Description: What they are known for
    • Personality Traits: Key characteristics
    • Speaking Style: How they communicate
    • Background: Biography and context
    • Knowledge Base: Upload documents or provide text
  3. Click "Create Persona"

Chatting with Personas

  1. Select a persona from dropdown
  2. Choose input method:
    • Text: Type your message
    • Voice: Record audio (English or Urdu)
  3. Press Enter or click Send
  4. The AI will respond with context from the knowledge base
  5. Use the Stop button to interrupt generation

Voice Cloning

Getting Started

  1. Get API Key

  2. Configure

    • Add ALLVOICELAB_API_KEY=your_api_key_here to .env
  3. Test

    cd echo
    python demo_voice.py

Using Voice Cloning in Frontend

  1. Go to the Voice Cloning page
  2. Upload voice sample: Provide a 10-30 second clear audio file (WAV, MP3, M4A, OGG)
  3. Clone voice: Enter voice name and click "Clone Voice"
  4. Generate speech:
    • Enter text you want to speak
    • Select the cloned voice
    • Adjust settings (speed, stability, similarity)
    • Click "Generate Speech"
  5. Listen & Download: Play audio in browser or download MP3/WAV

Voice Cloning Features

  • Multiple voice samples per persona
  • Adjustable speed (0.5 - 2.0x)
  • Stability slider (0 - 1)
  • Similarity enhancement
  • Multi-format output (MP3, WAV)
  • In-browser playback
  • Download generated audio

Document Management

  1. Go to the Documents page
  2. Upload PDFs or text files
  3. Documents are automatically ingested into ChromaDB
  4. Context is retrieved during conversations

Speech Features

  • Microphone Recording: Click to start/stop recording
  • Language Selection: Auto-detect or select English/Urdu
  • Transcription: Click "Transcribe" to convert speech to text
  • Real-time Display: See transcribed text with confidence score

Agentic AI Models

Echo now includes two powerful agentic AI models that use tools and multi-step reasoning:

1. Document Research Agent

Intelligently researches and synthesizes information from documents using:

  • Multi-step research: Plans research strategy and executes multiple iterations
  • Multi-query search: Uses multiple related queries for comprehensive coverage
  • Content analysis: Analyzes document content for specific information
  • Synthesis: Combines information from multiple sources into coherent answers

Use Cases:

  • Complex questions requiring information from multiple documents
  • Research tasks that need thorough investigation
  • Questions that benefit from multiple search angles

API Endpoint: POST /api/agents/document-research

Example Request:

{
  "persona_id": 1,
  "message": "What are the main themes in my documents about machine learning?",
  "session_id": "optional_session_id"
}

2. Conversational Planning Agent

Plans multi-step conversations and uses tools strategically:

  • Conversation planning: Analyzes context and plans responses
  • Intelligent tool selection: Decides when to search for information
  • Context-aware responses: Maintains natural conversation flow
  • Persona voice: Responds in the persona's authentic style

Use Cases:

  • Natural conversations that may need document lookups
  • Questions requiring context from previous messages
  • Maintaining persona personality while accessing knowledge base

API Endpoint: POST /api/agents/conversational-planning

Example Request:

{
  "persona_id": 1,
  "message": "Tell me about my favorite hobbies",
  "session_id": "conversation_session_123"
}

Agent Capabilities

Both agents feature:

  • Tool-based execution: Use document search and analysis tools
  • Multi-step reasoning: Plan and execute complex tasks
  • Transparent reasoning: Provide step-by-step reasoning logs
  • Error handling: Gracefully handle failures
  • Performance tracking: Execution time and tool usage metrics

Using Agents via API

# Document Research Agent
curl -X POST "http://localhost:8080/api/agents/document-research" \
  -H "Content-Type: application/json" \
  -d '{
    "persona_id": 1,
    "message": "What are the key points about AI in my documents?"
  }'

# Conversational Planning Agent
curl -X POST "http://localhost:8080/api/agents/conversational-planning" \
  -H "Content-Type: application/json" \
  -d '{
    "persona_id": 1,
    "message": "What did I say about my career goals?",
    "session_id": "session_123"
  }'

# List available agents
curl "http://localhost:8080/api/agents/available"

Docker Deployment

Build Images

cd echo/
docker-compose build

Image Sizes (CPU-optimized):

  • echo-api: ~3.3GB
  • echo-frontend: ~3.3GB
  • echo-speech-input: ~3.3GB

Run Services

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Services

  1. API Service (port 8081)

    • FastAPI backend
    • Health checks every 30s
    • Uvicorn ASGI server
  2. Frontend Service (port 8501)

    • Streamlit web interface
    • Depends on API health
  3. Speech Input Service (port 5000)

    • Optional microphone capture
    • Real-time audio streaming

API Endpoints

Health

  • GET /health - Application health status

Personas

  • GET /api/personas - List all personas
  • POST /api/personas - Create new persona
  • GET /api/personas/{id} - Get persona details
  • PUT /api/personas/{id} - Update persona
  • DELETE /api/personas/{id} - Delete persona

Chat

  • POST /api/chat/message - Send message to persona
  • GET /api/chat/history - Get conversation history
  • DELETE /api/chat/session - Clear session

Speech-to-Text

  • GET /api/speech/languages - Supported languages
  • POST /api/speech/transcribe/file - Transcribe uploaded audio
  • POST /api/speech/transcribe - Transcribe base64 audio
  • POST /api/speech/detect-language - Detect audio language
  • WS /api/speech/ws/transcribe - WebSocket real-time transcription

Voice Cloning

  • POST /api/voice/clone - Clone voice from audio sample
  • POST /api/voice/tts - Generate speech with cloned voice
  • GET /api/voice/personas/{id}/voices - List cloned voices for persona
  • DELETE /api/voice/voices/{id} - Delete cloned voice
  • GET /api/voice/health - Voice service health check

Documents

  • POST /api/documents/upload - Upload document
  • GET /api/documents - List documents
  • DELETE /api/documents/{id} - Delete document

Agentic AI Models

  • POST /api/agents/document-research - Use Document Research Agent for intelligent document research
  • POST /api/agents/conversational-planning - Use Conversational Planning Agent for context-aware conversations
  • GET /api/agents/available - List all available agentic AI models

API Documentation

Development

Local Development Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run API locally
cd echo
uvicorn app.main:app --reload

# In another terminal, run Streamlit
streamlit run frontend/app.py

Running Tests

cd echo/
pytest
pytest --cov=app tests/  # With coverage

Voice Cloning Demo

cd echo/
python demo_voice.py

Key Files

  • requirements.txt: All Python dependencies
  • Dockerfile: Multi-stage build for optimized images
  • docker-compose.yml: Service orchestration
  • app/core/config.py: Configuration management
  • app/rag/pipeline.py: RAG orchestration
  • app/voice/allvoicelab_client.py: Voice cloning client
  • app/agents/: Agentic AI models (Document Research, Conversational Planning)
  • app/agents/tools.py: Reusable agent tools
  • frontend/voice_cloning.py: Voice UI component

Pushing to Docker Hub

Prerequisites

Steps

  1. Login to Docker Hub
docker login
# Enter your Docker Hub username and password
  1. Tag images (replace <username> with your Docker Hub username)
cd echo/

# Tag API image
docker tag echo-api <username>/echo-persona-api:latest
docker tag echo-api <username>/echo-persona-api:1.0.0

# Tag Frontend image
docker tag echo-frontend <username>/echo-persona-frontend:latest
docker tag echo-frontend <username>/echo-persona-frontend:1.0.0

# Tag Speech service image
docker tag echo-speech-input <username>/echo-persona-speech:latest
docker tag echo-speech-input <username>/echo-persona-speech:1.0.0
  1. Push to Docker Hub
# Push API
docker push <username>/echo-persona-api:latest
docker push <username>/echo-persona-api:1.0.0

# Push Frontend
docker push <username>/echo-persona-frontend:latest
docker push <username>/echo-persona-frontend:1.0.0

# Push Speech service
docker push <username>/echo-persona-speech:latest
docker push <username>/echo-persona-speech:1.0.0

Verify on Docker Hub

Pulling Images for Team

Your team members can pull and run the images:

# Pull images
docker pull <username>/echo-persona-api:latest
docker pull <username>/echo-persona-frontend:latest
docker pull <username>/echo-persona-speech:latest

# Create docker-compose.yml with pulled images
# (modify the image names in docker-compose.yml to point to your Docker Hub repos)

# Run the application
docker-compose up -d

Model Information

Whisper Model

  • Size: Base model (~139MB)
  • Languages: 99 languages including English & Urdu
  • Accuracy: ~80-90% depending on audio quality
  • Speed: CPU ~30-60 seconds per minute of audio

LLM

  • Provider: Groq API (free tier available)
  • Model: llama-3.3-70b-versatile
  • Context Window: 8K tokens
  • Response Time: ~2-5 seconds (via Groq API)

Voice Cloning

  • Provider: AllVoiceLab
  • Voice Quality: High-quality natural speech
  • Audio Sample Required: 10-30 seconds of clear audio
  • Supported Formats: WAV, MP3, M4A, OGG
  • Output Formats: MP3, WAV
  • Processing Time: ~5-15 seconds per request

Embeddings

  • Model: Hugging Face sentence-transformers
  • Dimension: 384-768 dimensions
  • Vector Store: ChromaDB with persistent storage

Security Considerations

API Keys

  • Store in .env file (not committed to git)
  • Use environment variables in production
  • Rotate keys regularly
  • Keep AllVoiceLab API key confidential

Database

  • SQLite used for development (not production-ready)
  • For production, migrate to PostgreSQL

Authentication

  • Currently no authentication (add as needed)
  • For production, implement JWT or OAuth2

Voice Cloning

  • Ensure compliance with voice cloning regulations
  • Get consent before cloning someone's voice
  • Use for authorized purposes only

Performance Optimization

Current Optimizations

  • CPU-only PyTorch (reduced size from ~8.5GB to ~3.3GB)
  • Multi-stage Docker builds
  • Whisper base model (fastest among Whisper variants)
  • ChromaDB in-memory caching
  • AllVoiceLab cloud processing for voice cloning

For Production

  • Use PostgreSQL instead of SQLite
  • Add Redis caching layer
  • Implement request rate limiting
  • Add API authentication
  • Use GPU for faster transcription/inference
  • Cache generated voice files

Troubleshooting

Frontend won't load

# Restart frontend container
docker-compose restart frontend

# Check logs
docker-compose logs frontend

Transcription timeout

  • Increase timeout in frontend/speech_input.py (currently 180s)
  • Whisper model takes time to load on first use
  • Subsequent requests are faster

API connection error

# Check API health
curl http://localhost:8081/health

# Check if API is running
docker-compose ps

Voice cloning fails

  • Check AllVoiceLab API key in .env
  • Use 10-30 second clear audio samples
  • Supported formats: WAV, MP3, M4A, OGG
  • Verify audio quality (minimize background noise)

"API key not configured" error

  • Set ALLVOICELAB_API_KEY in .env file
  • Restart backend: docker-compose restart
  • Verify key is active on AllVoiceLab website

Out of memory

  • Reduce Whisper model size or use tiny model
  • Close other applications
  • Increase Docker memory allocation
  • Voice cloning uses cloud processing (minimal local memory impact)

Environment Variables Reference

# LLM Provider (groq or openai)
LLM_PROVIDER=groq

# Groq Configuration
GROQ_MODEL=llama-3.3-70b-versatile
GROQ_API_KEY=gsk_xxxxxxxxxxxxx

# OpenAI Configuration (optional)
OPENAI_API_KEY=sk-xxxxxxxxxxxxx

# Voice Cloning (AllVoiceLab)
ALLVOICELAB_API_KEY=your_api_key_here

# Google API (for search)
GOOGLE_API_KEY=xxxxxxxxxxxxx

# Hugging Face (for embeddings)
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxx

# Database paths (in containers)
SQLITE_DATABASE_PATH=/app/data/echo.db
CHROMA_PERSIST_DIRECTORY=/app/data/chroma
UPLOAD_DIRECTORY=/app/data/uploads

Contributing

  1. Create a feature branch: git checkout -b feature/feature-name
  2. Commit changes: git commit -am 'Add feature'
  3. Push to branch: git push origin feature/feature-name
  4. Submit pull request

Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review API documentation at http://localhost:8081/docs
  3. Check container logs: docker-compose logs
  4. For voice cloning issues, see VOICE_CLONING_GUIDE.md

Version History

v1.2.0 (Current - December 2025)

  • Agentic AI Models: Two intelligent agents for document research and conversational planning
  • Document Research Agent: Multi-step research with tool-based execution
  • Conversational Planning Agent: Context-aware conversations with intelligent tool selection
  • Agent Tools: Reusable tools for document search, analysis, and multi-query search
  • Agent API Endpoints: RESTful API for agent execution
  • Transparent Reasoning: Step-by-step reasoning logs from agents

v1.1.0 (December 8, 2025)

  • Voice cloning with AllVoiceLab API
  • Text-to-speech with cloned voices
  • Voice management per persona
  • Multiple output formats (MP3, WAV)
  • Adjustable speech parameters (speed, stability, similarity)
  • Audio file download functionality

v1.0.0

  • Basic persona creation and chat
  • Local Whisper speech-to-text
  • RAG with ChromaDB
  • Multi-language support (English, Urdu)
  • Docker containerization
  • Streamlit UI with stop button
  • CPU-optimized images (~3.3GB each)

Last Updated: December 8, 2025

About

Echo-Persona is a full-stack Digital Twin AI platform. It allows users to create and interact with customizable personas using Retrieval-Augmented Generation (RAG), local Whisper-based Speech-to-Text (English & Urdu), AI voice cloning, and specialized Agentic AI models for document research.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages