An open-source AI-powered interview agent that acts as your digital clone for Zoom job interviews. Uses your voice, face, and expertise to answer questions in real-time.
- 🧠 Multi-LLM Support: Ollama (local), OpenAI, Claude
- 📚 RAG Pipeline: Upload your resume/experience for context-aware answers
- 🎤 Voice Cloning: ElevenLabs integration for natural speech
- 👤 Live Avatar: HeyGen streaming for real-time video
- 🎥 Zoom Integration: OBS virtual camera + audio routing
- ⚡ Real-time: <5s end-to-end latency (question → video response)
- Node.js 18+
- (Optional) Ollama for local LLM
- (Optional) OBS Studio for Zoom integration
# Clone the repository
git clone https://github.com/yourusername/interview-avatar.git
cd interview-avatar
# Install dependencies
npm install
# Configure environment
cp .env.example .env.local
# Edit .env.local with your API keys
# Run development server
npm run dev# Start all services
docker-compose up -d
# Access at http://localhost:3000- Go to Settings → Select your LLM provider
- Local: Install Ollama + DeepSeek R1
- Cloud: Add OpenAI or Anthropic API key
- Go to Setup → RAG
- Upload resume, past interview answers, project descriptions
- System will use this context for answers
- Go to Practice
- Select a question
- Generate AI response
- Click "Speak Response" to hear it
- Go to Live
- Start avatar session
- Type or generate responses
- Avatar speaks with lip-sync
- Install OBS Studio
- Follow OBS Setup Guide
- Configure audio routing (see Audio Setup)
- Use OBS Virtual Camera in Zoom
┌─────────────────────────────────────────────────────────────┐
│ User Question │
└─────────────────────┬───────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ LLM (Ollama/OpenAI/Claude) + RAG │
│ → Generates contextual answer │
└─────────────────────┬───────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Voice Synthesis (ElevenLabs) │
│ → Converts text to speech │
└─────────────────────┬───────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Video Avatar (HeyGen Streaming) │
│ → Lip-synced video via WebRTC │
└─────────────────────┬───────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ OBS Virtual Camera → Zoom │
│ → Appears as your camera in meetings │
└─────────────────────────────────────────────────────────────┘
- Testing Guide - How to test each component
- OBS Setup - Configure virtual camera
- Audio Setup - Route audio to Zoom
- Deployment - Production deployment
- API Reference - REST API documentation
- Configuration - Environment variables
- Frontend: Next.js 16, TypeScript, Tailwind CSS
- LLM: Ollama (DeepSeek R1), OpenAI, Anthropic
- Voice: ElevenLabs
- Video: HeyGen Streaming API
- RAG: In-memory (ChromaDB planned)
- Integration: OBS WebSocket, BlackHole (audio)
Target latencies:
- LLM Response: <3s
- Voice Synthesis: <2s
- Video Streaming: <1s
- Total: <5s (question → video)
- Phase 1: Foundation (Next.js, UI, Config)
- Phase 2: Text Generation (LLM + RAG)
- Phase 3: Voice Synthesis
- Phase 4: Video Avatar
- Phase 5: OBS Integration
- Phase 6: Deployment & Polish
- ChromaDB integration
- Local voice (OpenVoice)
- Local video (LivePortrait)
- Real-time transcription (Whisper)
- Session recording & playback
- Performance analytics
Contributions welcome! See CONTRIBUTING.md
MIT License - see LICENSE
This tool is for educational and practice purposes. Always disclose the use of AI assistance in actual job interviews where required by the employer.
- 📧 Email: [email protected]
- 💬 Discord: Join our community
- 🐛 Issues: GitHub Issues
- HeyGen - Streaming avatar API
- ElevenLabs - Voice synthesis
- Ollama - Local LLM runtime
- OBS Studio - Virtual camera
Star ⭐ this repo if you find it useful!