Backend MANJU - Real-time Voice Chatbot

A high-performance, multi-agent voice chatbot backend system designed for Thai language call centers. Features advanced speech recognition, intelligent conversation handling, and optimized response times through innovative fast-path architecture.

🚀 Key Features

⚡ Performance Optimizations

Ultra-Fast Intent Classification: Agent-based intent classifier with direct tool calls bypasses full CrewAI orchestration for 70-80% faster responses
Parallel Processing: Concurrent product search and Google Sheets queries reduce latency by up to 60%
Smart Caching: LRU caches for product data, RAG results, and frequent queries
Hierarchical Processing: Optimized CrewAI process with timeout fallbacks (2.5s limit)
Lazy Loading: Models loaded on-demand to reduce startup time

🎯 Multi-Agent Architecture

Supervisor Agent: Fast intent classification (PRODUCT/KNOWLEDGE/GENERAL)
Product Agent: SKU lookup, inventory management, pricing queries
Knowledge Agent: RAG-powered document search and policy retrieval
Response Agent: Natural language response generation

🗣️ Speech Processing

Thai ASR: TyphoonASR with faster-whisper backend (2-4x faster than standard Whisper)
Text-to-Speech: F5-TTS-THAI integration for natural Thai voice synthesis
Multi-format Support: WAV, MP3, M4A, FLAC, OGG, WMA
Real-time Processing: Streaming audio support with voice activity detection

🔧 Technical Features

RESTful API: FastAPI-based with automatic OpenAPI documentation
Google Sheets Integration: Real-time product data synchronization
RAG System: Document-based knowledge retrieval with FAISS vector search
Fallback Mechanisms: Graceful degradation when services unavailable
Health Monitoring: Comprehensive system status and performance metrics

🏗️ Architecture

graph TD
    A[Voice Input] --> B[TyphoonASR]
    B --> C[Text Processing]
    C --> D[Supervisor Agent]
    D --> E{Intent Classification}

    E -->|PRODUCT| F[Product Agent]
    E -->|KNOWLEDGE| G[Knowledge Agent]
    E -->|GENERAL| H[Response Agent]

    F --> I[Product Data Lookup]
    G --> J[RAG Document Search]
    H --> K[Canned Response]

    I --> L[Response Agent]
    J --> L
    K --> L

    L --> M[Final Response]
    M --> N[F5-TTS-THAI]
    N --> O[Voice Output]

    subgraph "Fast-Path Bypass"
        P[Intent → Direct Tool] --> Q[Cache/SKU Lookup]
        Q --> R[Immediate Response]
        R --> M
    end

    subgraph "Data Sources"
        S[Google Sheets] --> F
        T[Document Store] --> G
        U[Product Database] --> F
    end

    style D fill:#e3f2fd
    style L fill:#f3e5f5
    style P fill:#e8f5e8

Hierarchical Flow

Supervisor Agent → Classifies user intent (PRODUCT/KNOWLEDGE/GENERAL)
Specialized Agents → Handle specific domains:
- Product Agent: SKU lookup, pricing, inventory
- Knowledge Agent: RAG search, policies, procedures
- Response Agent: General conversation, greetings
Response Agent → Synthesizes final response with proper formatting

Fast-Path Optimization

For speed, simple queries bypass the full hierarchy:

Direct tool calls for SKU/product queries
Cached responses for frequent questions
Parallel processing for multiple data sources

📊 Performance Metrics

Operation	Fast Path	Full CrewAI	Improvement
Product Query (SKU)	~0.3s	~2.8s	89% faster
Knowledge Search	~0.5s	~3.2s	84% faster
General Greeting	~0.1s	~2.1s	95% faster
Average Response	~0.4s	~2.9s	86% faster

Response Time Breakdown

Intent Classification: 0.1-0.2s
Cache Lookup: 0.05-0.1s
Tool Execution: 0.2-0.5s
Response Generation: 0.1-0.3s

🛠️ Installation

Prerequisites

Python 3.8+
CUDA GPU (recommended for ASR performance)
FFmpeg for audio processing

1. Clone Repository

git clone https://github.com/your-org/manju-backend.git
cd manju-backend

2. Install Dependencies

pip install -r backend/requirements.txt

3. Install F5-TTS-THAI Submodule

git submodule update --init --recursive
cd backend/F5-TTS-THAI-API
pip install -e .

4. Configure Environment

# Copy environment template
cp .env.example .env

# Edit with your API keys
OPENROUTER_API_KEY=your_openrouter_key
TOGETHER_API_KEY=your_together_key
GOOGLE_SHEETS_CREDENTIALS=path/to/credentials.json

🚀 Quick Start

Start Backend Server

cd backend
uvicorn new_server:app --host 0.0.0.0 --port 8000 --reload

Health Check

curl http://localhost:8000/health

Test Voice Input

curl -X POST "http://localhost:8000/api/voice" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "text=สวัสดีครับ สินค้า TEL001 ราคาเท่าไหร่"

📡 API Endpoints

Core Endpoints

POST `/api/voice`

Process voice input with multi-agent orchestration

{
  "audio": "audio_file.wav",
  "text": "optional_text",
  "history": [{"role": "user", "content": "previous message"}]
}

GET `/health`

System health and status information

{
  "status": "healthy",
  "asr_model_loaded": true,
  "llm_ready": true,
  "uptime": 3600.5
}

POST `/llm`

Direct LLM interaction (bypasses voice processing)

{
  "text": "Hello, how can I help you?",
  "history": []
}

TTS Endpoints

POST `/tts`

Text-to-speech generation

{
  "ref_audio": "reference.wav",
  "ref_text": "reference text",
  "gen_text": "text to generate"
}

POST `/stt`

Speech-to-text transcription

{
  "audio": "input.wav",
  "translate": false,
  "target_lg": "th"
}

⚙️ Configuration

Environment Variables

# LLM Configuration
OPENROUTER_API_KEY=your_key
TOGETHER_API_KEY=your_key
LLM_MODEL=openrouter/qwen/qwen3-4b:free

# Performance Settings
SPEED_MODE=true
MAX_RPM=200
CREW_TIMEOUT=2.5

# ASR Configuration
WHISPER_MODEL=large-v3-turbo
DEVICE=cuda

# TTS Configuration
TTS_MODEL=f5-tts-thai
VOCODER_DEVICE=cuda

Model Selection

The system automatically selects the best available LLM:

OpenRouter (preferred for quality)
Together AI (fallback for speed)
Ollama (local fallback when no API keys)

🔍 Monitoring & Debugging

Health Endpoints

/health - Overall system status
/debug/model_info - Model loading status
/health with llm_ready flag - LLM availability

Logging

import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("manju_backend")

Performance Profiling

# Enable detailed timing
response = await process_voice_input(text, speed_mode=True)
print(f"Processing time: {response['processing_time_seconds']}s")
print(f"Agent path: {response.get('agent_path', 'full_crew')}")

🧪 Testing

Unit Tests

cd backend
python -m pytest tests/ -v

Integration Tests

# Test full pipeline
python -c "
from MultiAgent_New import VoiceCallCenterMultiAgent
agent = VoiceCallCenterMultiAgent()
result = agent.process_voice_input('สวัสดีครับ')
print(result)
"

Load Testing

# Simulate concurrent requests
ab -n 100 -c 10 http://localhost:8000/health

📁 Project Structure

backend/
├── MultiAgent_New.py          # Core multi-agent orchestration
├── new_server.py             # FastAPI server
├── typhoon_asr.py            # Thai ASR wrapper
├── requirements.txt           # Python dependencies
├── F5-TTS-THAI-API/          # TTS submodule
│   └── src/f5_tts/
│       ├── f5_api_new_integrate.py  # TTS API router
│       └── ...
├── audio_uploads/             # Temporary audio files
├── temp/                      # Processing temporary files
└── documents/                 # Knowledge base documents

frontend/
├── simple_chatbot.py          # Basic chat interface
└── voice_chatbot.py           # Voice-enabled interface

tests/
└── test_*.py                  # Unit and integration tests

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Development Guidelines

Use type hints for all function parameters
Add docstrings to all public methods
Follow PEP 8 style guidelines
Add performance benchmarks for new features

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

TyphoonASR: High-performance Thai speech recognition
F5-TTS-THAI: Advanced Thai text-to-speech synthesis
CrewAI: Multi-agent orchestration framework
Faster-Whisper: Optimized speech recognition
OpenRouter: LLM API aggregation service

📞 Support

For support and questions:

Create an issue on GitHub
Check the documentation at /docs
Review logs with tail -f backend/logs/app.log

Built with ❤️ for Thai language AI applications

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
__pycache__		__pycache__
backend		backend
frontend		frontend
.gitignore		.gitignore
.gitmodules		.gitmodules
MANJU.postman_collection.json		MANJU.postman_collection.json
MANJU.postman_collection_v2		MANJU.postman_collection_v2
README.md		README.md

Uh oh!

Uh oh!

Celesca/MANJU

Folders and files

Latest commit

History

Repository files navigation

Backend MANJU - Real-time Voice Chatbot

🚀 Key Features

⚡ Performance Optimizations

🎯 Multi-Agent Architecture

🗣️ Speech Processing

🔧 Technical Features

🏗️ Architecture

Hierarchical Flow

Fast-Path Optimization

📊 Performance Metrics

Response Time Breakdown

🛠️ Installation

Prerequisites

1. Clone Repository

2. Install Dependencies

3. Install F5-TTS-THAI Submodule

4. Configure Environment

🚀 Quick Start

Start Backend Server

Health Check

Test Voice Input

📡 API Endpoints

Core Endpoints

POST /api/voice

GET /health

POST /llm

TTS Endpoints

POST /tts

POST /stt

⚙️ Configuration

Environment Variables

Model Selection

🔍 Monitoring & Debugging

Health Endpoints

Logging

Performance Profiling

🧪 Testing

Unit Tests

Integration Tests

Load Testing

📁 Project Structure

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/api/voice`

GET `/health`

POST `/llm`

POST `/tts`

POST `/stt`

Packages