Your adaptive AI assistant with personality, presence awareness, and desktop integration.
Version: 3.0
Alisa is a fully local AI companion that combines an animated avatar, natural voice conversation, presence detection, and intelligent desktop integration. Everything runs on your machine - your conversations and data stay private.
Key Features:
- 🎭 Animated avatar with emotional expressions
- 🗣️ Voice input/output with multiple languages
- 👁️ Webcam presence detection and attention tracking
- 🖥️ Desktop understanding (knows what you're working on)
- 🎮 Safe desktop automation (app control, browser, keyboard/mouse)
- 🧠 Adaptive learning (remembers your habits and preferences)
- 🌙 Idle companion mode (thoughtful presence during breaks)
- Animated Avatar - 6 emotions, blinking, talking animations (details)
- Voice I/O - Edge TTS (40+ voices), Faster Whisper STT, optional RVC (details)
- Emotion System - Expression changes based on conversation context
- Presence Detection - Face tracking, attention monitoring (details)
- Phase 10A: Desktop Understanding - App/file/task detection, error detection (docs)
- Smart Help - Context-aware assistance with 5-minute cooldown
- Phase 10B: Desktop Actions - App control, browser automation, safe commands (docs)
- Phase 10C: Habit Learning - Work schedule, app patterns, adaptive behavior (docs)
- Safety First - Whitelists, blacklists, rate limits, confirmation prompts
- Memory System - Short-term buffer + persistent SQLite storage (details)
- Idle Companion - Spontaneous thoughts during breaks (guide)
- Conversation Modes - Teasing, calm, serious personalities
- Python 3.10+
- Windows 10/11
- Local LLM server (llama.cpp recommended)
- Optional: Webcam, Microphone, GPU
1. Clone Repository
git clone https://github.com/Kush05Bhardwaj/Nexus-Alisa-AI-Assistant.git
cd "Alisa-AI Assistant"2. Start LLM Server (separate terminal)
# Example with llama.cpp
.\llama-server.exe -m .\models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -c 4096 -ngl 333. Launch Alisa
# One command starts everything (backend, overlay, vision, chat)
.\scripts\start_phase10c.ps1Done! 🎉 Alisa is ready.
# Minimal (text only)
.\scripts\start_backend.ps1 # Terminal 1
.\scripts\start_text_chat.ps1 # Terminal 2
# Voice conversation
.\scripts\start_backend.ps1 # Terminal 1
.\scripts\start_voice.ps1 # Terminal 2
# Custom combinations
.\scripts\start_overlay.ps1 # Add avatar
.\scripts\start_vision.ps1 # Add presence detectionFor complete control, start each component manually:
# Terminal 1: LLM Server
cd F:\llama
.\llama-server.exe `
-m .\models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf `
-c 4096 `
-ngl 33 `
--split-mode layer
# Terminal 2: Backend
cd F:\Projects\Alisa\Alisa-AI Assistant\backend
.\venv\Scripts\Activate.ps1
cd ..
uvicorn backend.app.main:app --reload
# Terminal 3: Overlay (Avatar)
cd F:\Projects\Alisa\Alisa-AI Assistant\overlay
.\venv\Scripts\Activate.ps1
python main.py
# Terminal 4: Webcam Vision
cd F:\Projects\Alisa\Alisa-AI Assistant\vision
.\venv\Scripts\Activate.ps1
python vision_client.py
# Terminal 5: Desktop Understanding (Screen Vision)
cd F:\Projects\Alisa\Alisa-AI Assistant\vision
.\venv\Scripts\Activate.ps1
python vision_client_screen.py
# Terminal 6: Voice Chat
cd F:\Projects\Alisa\Alisa-AI Assistant\voice
.\venv\Scripts\Activate.ps1
python voice_chat_optimized.py
# Terminal 7: Text Chat (Alternative to voice)
cd F:\Projects\Alisa\Alisa-AI Assistant\voice
.\venv\Scripts\Activate.ps1
python text_chat.pyNote: Start terminals in order. Backend must be running before starting other components.
Alisa-AI-Assistant/
├── backend/ # FastAPI server, LLM integration, all Phase features
├── overlay/ # Animated avatar window (Tkinter)
├── voice/ # Voice I/O (Edge TTS, Faster Whisper, RVC)
├── vision/ # Presence detection, desktop understanding
├── scripts/ # PowerShell startup scripts & utilities
└── docs/ # Complete documentation (12,400+ lines)
# voice/voice_config.py
SELECTED_VOICE = "nanami" # Japanese anime-style
SPEECH_RATE = "+20%"
PITCH_SHIFT = "+15Hz"# backend/app/prompt.py
SYSTEM_PROMPT = """Your name is Alisa..."""# vision/vision_config.py
apply_preset("ultra_light") # Low CPU
apply_preset("enhanced") # Better accuracy# backend/app/desktop_actions.py
app_paths = {
"myapp": "C:\\Path\\To\\App.exe"
}Each module has detailed documentation covering setup, API, features, and troubleshooting:
| Module | Documentation | Lines | Coverage |
|---|---|---|---|
| Backend | backend/README.md | ||
| Overlay | overlay/README.md | ||
| Voice | voice/README.md | ||
| Vision | vision/README.md | ||
| Scripts | scripts/README.md |
-
- Complete system architecture
- 4-layer design (Presentation, Communication, Core Logic, Data)
- Data flow diagrams
- Component interactions
- Technology stack breakdown
- Deployment architecture
- Performance characteristics
- Security model
-
- File-by-file documentation (all 67 files)
- Purpose, key components, dependencies
- Lines of code statistics
- Quick lookup tables
- Import patterns and conventions
- OS: Windows 10/11 (64-bit)
- Python: 3.10 or higher
- RAM: 4GB (backend + LLM)
- Storage: 2GB (models + dependencies)
- CPU: 4 cores (for concurrent processing)
- OS: Windows 11
- Python: 3.11
- RAM: 8GB+ (for smooth operation)
- Storage: 10GB+ (multiple models)
- CPU: 6+ cores
- GPU: NVIDIA GPU with CUDA (for faster LLM inference)
- Webcam: 720p or higher
- Microphone: Any USB/built-in microphone
- Tesseract OCR - For screen text extraction
- CUDA Toolkit - For GPU acceleration
- RVC Models - For custom voice conversion
- Core chat functionality with LLM streaming
- Animated avatar overlay (6 emotions)
- Voice output (Edge TTS) and input (Faster Whisper)
- Emotion detection and expression system
- Conversation modes (teasing, calm, serious)
- Memory system (short & long-term SQLite)
- Idle companion system
- Desktop understanding
- Application detection
- File type recognition
- Task inference
- Error detection
- Smart help offers
- Desktop actions
- App management
- Browser control
- Keyboard/mouse automation
- File operations
- Safety system (whitelist, blacklist, rate limits)
- Task memory & habit learning
- Work schedule detection
- App usage pattern tracking
- Silence preference learning
- Repeated task recognition
- Adaptive behavior
- Settings UI panel (web-based dashboard)
- System tray integration
- Multi-language support enhancements
-
Emotional Intelligence
- Advanced emotion detection from text
- Context-aware emotional responses
- Emotional state tracking over time
- Mood-based interaction patterns
-
Creative Assistance
- Code generation and refactoring
- Writing assistance and editing
- Brainstorming and idea generation
- Project planning and task breakdown
-
Multi-Modal Learning
- Document analysis and summarization
- Image understanding and description
- Video content analysis
- Multi-document synthesis
- Multiple avatar themes and character designs
- Plugin system for community extensions
- Cross-platform support (Linux, macOS)
- Mobile companion app (Android/iOS)
- Voice activity detection (no push-to-talk)
- Advanced RVC voice training pipeline
- Multi-user support with profiles
- Cloud sync for settings (optional)
- Integration with productivity tools (calendar, todo lists)
- Advanced context awareness (git status, running processes)
We welcome contributions! Areas where you can help:
- New avatar expressions and themes
- Voice model training and sharing
- Personality preset configurations
- User interface improvements
- Performance optimizations
- Bug fixes and stability improvements
- New conversation modes
- Platform support (Linux, macOS)
- Plugin system development
- Tutorial videos and guides
- Translation to other languages
- Usage examples and case studies
- API documentation improvements
- Bug reporting with detailed steps
- Feature testing on different systems
- Performance benchmarking
- User experience feedback
- Fork the repository
git clone https://github.com/YOUR_USERNAME/Nexus-Alisa-AI-Assistant-.git
cd "Alisa-AI Assistant"- Create a feature branch
git checkout -b feature/AmazingFeature-
Make your changes
- Follow code style guidelines (see DEVELOPMENT.md)
- Add tests if applicable
- Update documentation
-
Commit your changes
git commit -m 'Add some AmazingFeature'- Push to your fork
git push origin feature/AmazingFeature- Open a Pull Request
- Describe your changes clearly
- Reference any related issues
- Include screenshots/videos if UI changes
<type>: <subject>
<body>
<footer>
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting)refactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks
MIT License - see LICENSE file for details.
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Private use allowed
- ℹ️ License and copyright notice must be included
⚠️ No warranty provided
- LLM Integration: llama.cpp - Fast CPU/GPU inference
- Voice Synthesis: Microsoft Edge TTS - High-quality text-to-speech
- Speech Recognition: faster-whisper - Optimized Whisper implementation
- Voice Conversion: RVC - Real-time voice conversion
- Computer Vision: OpenCV - Image processing and face detection
- Web Framework: FastAPI - Modern Python web framework
- Database: SQLAlchemy - SQL toolkit and ORM
- VTuber Culture - Avatar animation and personality design
- Anime Characters - Tsundere personality archetype
- AI Assistants - Siri, Alexa, Google Assistant concepts
- Desktop Companions - Clippy (but actually helpful!)
- Open source community for amazing tools
- Beta testers for valuable feedback
- Contributors for improvements and bug fixes
Documentation First:
- Check the docs/ folder for detailed guides
- Read module-specific READMEs for troubleshooting
GitHub Issues:
- Report bugs: GitHub Issues
- Request features: Use "enhancement" label
- Ask questions: Use "question" label
When Reporting Issues: Please include:
- Your system specs (OS, Python version, RAM, GPU)
- Steps to reproduce the problem
- Error messages (full traceback)
- Relevant logs from terminal
- What you've already tried
Common issues:
- Port 8000 in use →
netstat -ano | findstr :8000then kill process - LLM not connecting → Verify
http://127.0.0.1:8080/healthresponds - Webcam not found → Check device manager, close other camera apps
- High CPU usage → Switch to
ultra_lightvision preset - Module not found → Ensure venv is activated, reinstall requirements
Check system status:
# Python version
python --version
# Check if backend is running
Invoke-WebRequest -Uri "http://127.0.0.1:8000/"
# Check if LLM is running
Invoke-WebRequest -Uri "http://127.0.0.1:8080/health"
# List audio devices
python -c "import sounddevice; print(sounddevice.query_devices())"
# Test webcam
python -c "import cv2; cap = cv2.VideoCapture(0); print('Webcam:', cap.isOpened())"View logs:
# Backend logs (check terminal running start_backend.ps1)
# Look for errors in red text
# Database inspection
python .\scripts\view_history.pyCurrent Version: 3.0
Stability: Production Ready
Last Updated: January 17, 2026
v3.0 (January 2026)
- Task memory and habit learning system
- Adaptive behavioral adjustments
- Work schedule detection
- Complete documentation (12,400+ lines)
- System architecture documentation
- Codebase structure documentation
v2.5 (January 2026)
- Desktop actions and automation
- Safety system implementation
- Permission-based execution
v2.0 (January 2026)
- Desktop understanding system
- Screen analysis and OCR
- Context-aware assistance
v1.5 (December 2025)
- Idle companion system
- Spontaneous behavior
- Presence awareness
v1.0 (December 2025) - Core Release
- Basic chat functionality
- Avatar overlay
- Voice I/O
- Emotion system
Made with ❤️ by Kushagra Bhardwaj
Repository: Nexus-Alisa-AI-Assistant-
Alisa is more than just an AI assistant - she's your companion, understanding your work, adapting to your habits, and growing with you over time. Welcome to the future of personal AI assistance! 💙