Elda is an intelligent voice assistant specifically designedΒ to help elderly users navigate technology with confidence. Using advanced AI and natural language processing, Elda provides step-by-step visual tutorialsΒ triggered byΒ simple voice commands, makingΒ technology more accessible for seniors.
- Wake Word Detection: Uses Porcupine wake word engine with custom "Hey Elda" trigger
- Speech-to-Text: Powered by OpenAI Whisper for accurate voice transcription
- Intent Recognition: AI-powered command understanding using Google Gemini
- Text-to-Speech: ElevenLabs integration for natural voice responses
- Volume Control: Increase/decrease system volume by 50% or custom amounts
- Brightness Control: Adjust screen brightness with voice commands
- Screen Zoom: Control macOS accessibility zoom features
- Smart Defaults: All commands default to 50% adjustments for better user experience
- AI-Generated Guides: Creates step-by-step tutorials for any topic
- Visual Interface: Beautiful Electron-based tutorial cards
- Progress Tracking: Save and resume tutorial progress
- Detailed Help: Expandable help sections for each step
- React Frontend: Responsive tutorial interface with smooth animations
- Always-on-Top Window: Non-intrusive popup that stays accessible
- Visual States: Different Elda avatars for listening, thinking, and tutorial modes
- Real-time Updates: Live WebSocket communication between components
Elda/
βββ voice.py # Main wake word detection and orchestration
βββ speech2text/
β βββ stt_capture.py # Speech processing and intent handling
β βββ howto_generator.py # Flask API for tutorial generation
βββ tts_announcer.py # Text-to-speech with ElevenLabs
βββ volume.py # System volume control
βββ brightness.py # Screen brightness control
βββ zoom_controller/ # macOS zoom accessibility features
βββ websocket_client.py # Electron communication
βββ elda-app/ # Electron frontend
βββ main.js # Electron main process
βββ renderer/ # React frontend
β βββ App.jsx # Main tutorial interface
β βββ components/ # React components
β βββ styles.css # Styling
βββ package.json # Node.js dependencies
- Python 3.9+
- Node.js 16+
- macOS (for system controls)
- Microphone access
git clone <repository-url>
cd Elda# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtcd elda-app
npm install
cd ..Create a .env file in the root directory:
# Wake Word Detection
ACCESS_KEY=your_porcupine_access_key
# AI Services
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
# Text-to-Speech
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_voice_idTerminal 1 - Main Voice Assistant:
source .venv/bin/activate
python voice.pyTerminal 2 - Tutorial Generator API:
source .venv/bin/activate
python speech2text/howto_generator.pyTerminal 3 - Electron Frontend:
cd elda-app
npm start- "Hey Elda, increase volume" - Increase volume by 50%
- "Hey Elda, decrease volume" - Decrease volume by 50%
- "Hey Elda, turn up volume by 25" - Custom volume adjustment
- "Hey Elda, make it brighter" - Increase screen brightness
- "Hey Elda, zoom in" - Zoom into screen
- "Hey Elda, zoom out" - Zoom out of screen
- "Hey Elda, introduce yourself" - Get Elda's introduction
- "Hey Elda, show me how to [topic]" - Generate interactive tutorial
- "Hey Elda, how do I [action]" - Get step-by-step guidance
- "Hey Elda, how do I send an email?"
- "Hey Elda, can you make this video louder?"
- "Hey Elda, zoom in on the screen please?"
Replace hello_elda.ppn with your custom Porcupine wake word model.
Modify voice settings in tts_announcer.py:
self.voice_id = os.getenv("ELEVENLABS_VOICE_ID", "your_default_voice")Adjust tutorial generation in howto_generator.py:
HOWTO_SYSTEM_PROMPT = """Your custom prompt here..."""- Add Intent Detection in
speech2text/stt_capture.py:
# In detect_intent_keywords()
if any(word in text for word in ["your", "keywords"]):
return "your_new_intent"- Handle the Command:
elif intent == "your_new_intent":
print("π New command detected!")
# Your implementation here- Update Gemini Prompt to recognize the new intent.
Create a new module (e.g., new_feature.py) and integrate it:
from new_feature import your_function
# Add to handle_command()- Listening State: Shows Elda avatar when ready to listen
- Thinking State: Animated thinking gif during processing
- Tutorial State: Interactive tutorial cards
- Step Navigation: Next/Previous buttons
- Progress Tracking: Visual progress indicators
- Detailed Help: Expandable help sections
- Completion Tracking: Mark steps as completed
For zoom and brightness controls, grant Terminal accessibility permissions:
- System Preferences β Security & Privacy β Privacy β Accessibility
- Add Terminal (or your terminal app)
- Enable the checkbox
Grant microphone permissions for wake word detection and speech recognition.
"Wake word not detected"
- Check microphone permissions
- Verify Porcupine access key
- Ensure quiet environment
"Intent detection fails"
- Check Gemini API quota (50 requests/day free tier)
- System falls back to keyword matching automatically
- Verify API keys in
.env
"Tutorial not showing"
- Ensure Flask server is running (
python speech2text/howto_generator.py) - Check port 3000 is available
- Verify Electron frontend is running
"Volume/Brightness not working"
- Grant accessibility permissions
- Try running from terminal with elevated permissions
- Check macOS version compatibility
Enable debug mode by setting environment variables:
export DEBUG=1
python voice.py- Fork the repository
- Create a feature branch
- Add your improvements
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.