A high-performance, multi-agent voice chatbot backend system designed for Thai language call centers. Features advanced speech recognition, intelligent conversation handling, and optimized response times through innovative fast-path architecture.
- Ultra-Fast Intent Classification: Agent-based intent classifier with direct tool calls bypasses full CrewAI orchestration for 70-80% faster responses
- Parallel Processing: Concurrent product search and Google Sheets queries reduce latency by up to 60%
- Smart Caching: LRU caches for product data, RAG results, and frequent queries
- Hierarchical Processing: Optimized CrewAI process with timeout fallbacks (2.5s limit)
- Lazy Loading: Models loaded on-demand to reduce startup time
- Supervisor Agent: Fast intent classification (PRODUCT/KNOWLEDGE/GENERAL)
- Product Agent: SKU lookup, inventory management, pricing queries
- Knowledge Agent: RAG-powered document search and policy retrieval
- Response Agent: Natural language response generation
- Thai ASR: TyphoonASR with faster-whisper backend (2-4x faster than standard Whisper)
- Text-to-Speech: F5-TTS-THAI integration for natural Thai voice synthesis
- Multi-format Support: WAV, MP3, M4A, FLAC, OGG, WMA
- Real-time Processing: Streaming audio support with voice activity detection
- RESTful API: FastAPI-based with automatic OpenAPI documentation
- Google Sheets Integration: Real-time product data synchronization
- RAG System: Document-based knowledge retrieval with FAISS vector search
- Fallback Mechanisms: Graceful degradation when services unavailable
- Health Monitoring: Comprehensive system status and performance metrics
graph TD
A[Voice Input] --> B[TyphoonASR]
B --> C[Text Processing]
C --> D[Supervisor Agent]
D --> E{Intent Classification}
E -->|PRODUCT| F[Product Agent]
E -->|KNOWLEDGE| G[Knowledge Agent]
E -->|GENERAL| H[Response Agent]
F --> I[Product Data Lookup]
G --> J[RAG Document Search]
H --> K[Canned Response]
I --> L[Response Agent]
J --> L
K --> L
L --> M[Final Response]
M --> N[F5-TTS-THAI]
N --> O[Voice Output]
subgraph "Fast-Path Bypass"
P[Intent → Direct Tool] --> Q[Cache/SKU Lookup]
Q --> R[Immediate Response]
R --> M
end
subgraph "Data Sources"
S[Google Sheets] --> F
T[Document Store] --> G
U[Product Database] --> F
end
style D fill:#e3f2fd
style L fill:#f3e5f5
style P fill:#e8f5e8
- Supervisor Agent → Classifies user intent (PRODUCT/KNOWLEDGE/GENERAL)
- Specialized Agents → Handle specific domains:
- Product Agent: SKU lookup, pricing, inventory
- Knowledge Agent: RAG search, policies, procedures
- Response Agent: General conversation, greetings
- Response Agent → Synthesizes final response with proper formatting
For speed, simple queries bypass the full hierarchy:
- Direct tool calls for SKU/product queries
- Cached responses for frequent questions
- Parallel processing for multiple data sources
| Operation | Fast Path | Full CrewAI | Improvement |
|---|---|---|---|
| Product Query (SKU) | ~0.3s | ~2.8s | 89% faster |
| Knowledge Search | ~0.5s | ~3.2s | 84% faster |
| General Greeting | ~0.1s | ~2.1s | 95% faster |
| Average Response | ~0.4s | ~2.9s | 86% faster |
- Intent Classification: 0.1-0.2s
- Cache Lookup: 0.05-0.1s
- Tool Execution: 0.2-0.5s
- Response Generation: 0.1-0.3s
- Python 3.8+
- CUDA GPU (recommended for ASR performance)
- FFmpeg for audio processing
git clone https://github.com/your-org/manju-backend.git
cd manju-backendpip install -r backend/requirements.txtgit submodule update --init --recursive
cd backend/F5-TTS-THAI-API
pip install -e .# Copy environment template
cp .env.example .env
# Edit with your API keys
OPENROUTER_API_KEY=your_openrouter_key
TOGETHER_API_KEY=your_together_key
GOOGLE_SHEETS_CREDENTIALS=path/to/credentials.jsoncd backend
uvicorn new_server:app --host 0.0.0.0 --port 8000 --reloadcurl http://localhost:8000/healthcurl -X POST "http://localhost:8000/api/voice" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]" \
-F "text=สวัสดีครับ สินค้า TEL001 ราคาเท่าไหร่"Process voice input with multi-agent orchestration
{
"audio": "audio_file.wav",
"text": "optional_text",
"history": [{"role": "user", "content": "previous message"}]
}System health and status information
{
"status": "healthy",
"asr_model_loaded": true,
"llm_ready": true,
"uptime": 3600.5
}Direct LLM interaction (bypasses voice processing)
{
"text": "Hello, how can I help you?",
"history": []
}Text-to-speech generation
{
"ref_audio": "reference.wav",
"ref_text": "reference text",
"gen_text": "text to generate"
}Speech-to-text transcription
{
"audio": "input.wav",
"translate": false,
"target_lg": "th"
}# LLM Configuration
OPENROUTER_API_KEY=your_key
TOGETHER_API_KEY=your_key
LLM_MODEL=openrouter/qwen/qwen3-4b:free
# Performance Settings
SPEED_MODE=true
MAX_RPM=200
CREW_TIMEOUT=2.5
# ASR Configuration
WHISPER_MODEL=large-v3-turbo
DEVICE=cuda
# TTS Configuration
TTS_MODEL=f5-tts-thai
VOCODER_DEVICE=cudaThe system automatically selects the best available LLM:
- OpenRouter (preferred for quality)
- Together AI (fallback for speed)
- Ollama (local fallback when no API keys)
/health- Overall system status/debug/model_info- Model loading status/healthwithllm_readyflag - LLM availability
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("manju_backend")# Enable detailed timing
response = await process_voice_input(text, speed_mode=True)
print(f"Processing time: {response['processing_time_seconds']}s")
print(f"Agent path: {response.get('agent_path', 'full_crew')}")cd backend
python -m pytest tests/ -v# Test full pipeline
python -c "
from MultiAgent_New import VoiceCallCenterMultiAgent
agent = VoiceCallCenterMultiAgent()
result = agent.process_voice_input('สวัสดีครับ')
print(result)
"# Simulate concurrent requests
ab -n 100 -c 10 http://localhost:8000/healthbackend/
├── MultiAgent_New.py # Core multi-agent orchestration
├── new_server.py # FastAPI server
├── typhoon_asr.py # Thai ASR wrapper
├── requirements.txt # Python dependencies
├── F5-TTS-THAI-API/ # TTS submodule
│ └── src/f5_tts/
│ ├── f5_api_new_integrate.py # TTS API router
│ └── ...
├── audio_uploads/ # Temporary audio files
├── temp/ # Processing temporary files
└── documents/ # Knowledge base documents
frontend/
├── simple_chatbot.py # Basic chat interface
└── voice_chatbot.py # Voice-enabled interface
tests/
└── test_*.py # Unit and integration tests
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- Use type hints for all function parameters
- Add docstrings to all public methods
- Follow PEP 8 style guidelines
- Add performance benchmarks for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- TyphoonASR: High-performance Thai speech recognition
- F5-TTS-THAI: Advanced Thai text-to-speech synthesis
- CrewAI: Multi-agent orchestration framework
- Faster-Whisper: Optimized speech recognition
- OpenRouter: LLM API aggregation service
For support and questions:
- Create an issue on GitHub
- Check the documentation at
/docs - Review logs with
tail -f backend/logs/app.log
Built with ❤️ for Thai language AI applications