A modern voice-controlled home assistant device that can play music, generate images, create audio, and more. Supports both cloud-based and local language models for enhanced flexibility and privacy.
- Dual-mode language model support (OpenAI API or local Phi-2 model)
- Voice activation with customizable wake word
- Music playback from YouTube
- Image generation using Stable Diffusion
- Audio generation using Meta's MusicGen
- Speech recognition and text-to-speech capabilities
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Copy
.env.example
to.env
and fill in your API keys:- OpenAI API key (if using OpenAI mode)
- Replicate API key
- Run the assistant:
python main.py
- Say the wake word (default: "hey")
- Give a command:
- "play song [song name]"
- "generate image [description]"
- "generate audio [description]"
Adjust settings in config.py
:
- MODEL_TYPE: Choose between 'openai' (cloud-based) or 'phi2' (local model)
- Wake word
- Audio recording parameters
- Image generation settings
- Logging preferences
For local model setup:
- PHI2_MODEL_PATH: Path to the Phi-2 model (default: 'microsoft/phi-2')
- No API key required for local model mode
- Uses modern Python practices
- Includes logging for debugging
- Configurable settings
- Error handling
- Multi-threading support
- Python 3.8+
- See requirements.txt for package dependencies