Digital Human AI Agent Platform — Arabic-First, Open-Source AI Stack
SmartTalker is an end-to-end platform for building real-time digital human AI agents. It takes speech or text input and produces a talking avatar video response — powered entirely by open-source AI models. The platform is designed with Arabic as the primary language, targeting MENA markets, but supports multilingual use cases out of the box.
- Full Speech Pipeline — ASR, LLM reasoning, TTS, and talking-head video generation in a single API call
- Arabic-First — Native Arabic support across all pipeline layers (ASR, LLM, TTS)
- Real-Time Communication — REST API, WebSocket, and WebRTC interfaces for flexible integration
- WhatsApp Integration — Built-in WhatsApp Business API client for conversational AI over messaging
- Voice Cloning — Clone voices from 3–10 second reference audio samples
- Emotion-Aware — Detects and applies emotion to both speech synthesis and avatar animation
- Production-Ready — Redis rate limiting, API key auth, Prometheus metrics, Docker deployment, and structured JSON logging
- Cost-Efficient — Runs on a single GPU server at $50–150/month using fully open-source models
First Client: BusTickets Pro — WhatsApp bus booking assistant Cost Target: $50–150/month operational
SmartTalker uses a 6-layer pipeline architecture:
┌─────────────────────────────────────────────────────────────────┐
│ SmartTalker Pipeline │
│ │
│ 🎤 Audio In 🎬 Video Out │
│ │ ▲ │
│ ▼ │ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ ASR │──▶│ LLM │──▶│ TTS │──▶│ Video │──▶│Upscale │ │
│ │Fun-ASR │ │Qwen 2.5│ │CosyVoice│ │EchoMimic│ │RealESR │ │
│ │ Nano │ │ 14B │ │ 3.0 │ │ V2 │ │ GAN │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Orchestrator: FastAPI + WebSocket + Redis │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Layer | Tool | Purpose |
|---|---|---|
| 1. ASR | Fun-ASR Nano | Speech → Text |
| 2. LLM | Qwen 2.5 14B via Ollama | Reasoning & Response |
| 3. TTS | CosyVoice 3.0 | Text → Speech |
| 4. Video | EchoMimicV2 | Audio → Talking Head |
| 5. Upscale | RealESRGAN + CodeFormer | Quality Enhancement |
| 6. Orchestrator | FastAPI + WebSocket + Redis | Coordination |
- OS: Ubuntu 22.04 LTS
- GPU: NVIDIA RTX 4090 (24GB VRAM) or equivalent
- NVIDIA Driver: 545+
- Docker: 24.0+
- Python: 3.10+
git clone https://github.com/ali-ibnouf/SmartTalker.git
cd SmartTalker
chmod +x setup.sh
sudo ./setup.sh# Clone the repo
git clone https://github.com/ali-ibnouf/SmartTalker.git
cd SmartTalker
# Configure environment
cp .env.example .env
# Edit .env with your settings
# Build and run
docker compose up -d
# Pull the LLM model
docker exec smarttalker-ollama ollama pull qwen2.5:14b
# Download AI models
bash scripts/download_models.sh# Create virtual environment
python3.10 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure
cp .env.example .env
# Download models
bash scripts/download_models.sh
# Start Ollama (separate terminal)
ollama serve
# Run the app
make dev# Health check
curl http://localhost:8000/api/v1/health
# Test text-to-speech
curl -X POST http://localhost:8000/api/v1/text-to-speech \
-H "Content-Type: application/json" \
-d '{"text": "مرحباً بكم في سمارت توكر", "language": "ar"}'SmartTalker/
├── src/
│ ├── config.py # Pydantic Settings
│ ├── main.py # FastAPI application
│ ├── pipeline/ # AI processing engines
│ │ ├── orchestrator.py # Pipeline coordinator
│ │ ├── asr.py # Fun-ASR Nano
│ │ ├── llm.py # Qwen 2.5 via Ollama
│ │ ├── tts.py # CosyVoice 3.0
│ │ ├── video.py # EchoMimicV2
│ │ ├── upscale.py # RealESRGAN + CodeFormer
│ │ └── emotions.py # Emotion detection
│ ├── api/ # REST + WebSocket API
│ ├── integrations/ # WhatsApp, WebRTC, Storage
│ └── utils/ # Audio, video, logging
├── tests/ # Test suite
├── scripts/ # Setup & maintenance scripts
├── avatars/ # Avatar reference images
├── voices/ # Voice reference audio
├── docs/ # Documentation
├── docker-compose.yml # 3-service stack
├── Dockerfile # Multi-stage build
├── Makefile # Build targets
└── requirements.txt # Pinned dependencies
make setup # Initial setup (Linux)
make setup-win # Initial setup (Windows)
make build # Build Docker images
make run # Start all services
make dev # Run locally with hot reload
make test # Run test suite
make lint # Run linters
make format # Format code
make download-models # Download AI models
make clean # Clean generated files
make help # Show all targets| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/text-to-speech |
Text → Audio |
| POST | /api/v1/audio-chat |
Audio → Audio |
| POST | /api/v1/text-to-video |
Text → Video |
| POST | /api/v1/voice-clone |
Clone a voice |
| GET | /api/v1/voices |
List voices |
| GET | /api/v1/health |
System health |
| WS | /ws/chat/{avatar_id} |
Real-time chat |
Full API docs: http://localhost:8000/docs
MIT License — see LICENSE for details.
سمارت توكر هو منصة وكيل ذكاء اصطناعي رقمي بشري، مصمم خصيصاً للأسواق العربية في منطقة الشرق الأوسط وشمال أفريقيا (MENA).
- 🎤 التعرف على الكلام — دعم كامل للغة العربية باستخدام Fun-ASR
- 🧠 الذكاء الاصطناعي — محادثة طبيعية بالعربية مع Qwen 2.5
- 🗣️ تحويل النص إلى كلام — صوت عربي طبيعي مع CosyVoice
- 🎬 فيديو ذكي — أفاتار متحرك واقعي مع EchoMimicV2
- 📱 واتساب — تكامل مباشر مع واتساب للأعمال
نظام حجز تذاكر الحافلات عبر واتساب — يتحدث العربية بطلاقة ويوفر تجربة حجز سهلة وسريعة.
# استنساخ المشروع
git clone https://github.com/ali-ibnouf/SmartTalker.git
cd SmartTalker
# الإعداد التلقائي
chmod +x setup.sh
sudo ./setup.sh
# تشغيل الخدمات
docker compose up -dالهدف: 50–150 دولار شهرياً — باستخدام أدوات مفتوحة المصدر بالكامل.