Spotty: Intelligent Interface for Boston Dynamics Spot Robot

Spotty is a multimodal system that enhances Boston Dynamics' Spot robot with natural language interaction, contextual awareness, and advanced navigation capabilities. It combines computer vision, speech recognition, and language understanding to create a more intuitive and helpful robotic assistant.

🌟 Features

🗣️ Voice Interaction

Wake Word Detection: Activate Spot with "Hey Spot" using Porcupine wake word detector
Speech-to-Text: Convert voice commands to text using OpenAI Whisper
Text-to-Speech: Natural voice responses with OpenAI TTS
Conversational Memory: Maintain context across interactions

🧠 Contextual Understanding

Multimodal RAG System: Retrieve and generate answers based on robot's location and visual context
Vector Database: Store and query spatial information efficiently
Location-Based Responses: Provide context-aware information relevant to the robot's current position
Object and Scene Understanding: Recognize objects and environments using GPT vision models

🗺️ Enhanced Navigation

GraphNav Integration: Navigate complex environments using Boston Dynamics' GraphNav system
Waypoint Labeling: Automatically or manually label waypoints (e.g., "kitchen", "office")
Location Queries: Navigate to locations by name (e.g., "Go to the kitchen")
Object Search: Find and navigate to objects (e.g., "Find the coffee mug")

👁️ Visual Intelligence

Scene Description: Describe what the robot sees using vision-language models
Visual Question Answering: Answer questions about the robot's surroundings
Object Detection: Identify objects in the environment
Environment Mapping: Build and maintain a semantic map of the environment

🏗️ System Architecture

Spotty consists of several integrated components that work together to provide a cohesive interaction experience:

Unified Spot Interface: Core component that orchestrates all subsystems
GraphNav Interface: Handles map recording, localization, and navigation
Audio Interface: Manages wake word detection, speech recognition, and audio output
RAG Annotation: Maintains knowledge base about locations and objects
Vision System: Processes camera feeds and interprets visual information

All components leverage modern AI services:

OpenAI GPT-4o-mini: Natural language understanding and generation
OpenAI Whisper & TTS: Speech processing
CLIP: Visual-language understanding
FAISS: Vector database for efficient similarity search

🚀 Getting Started

Prerequisites

Boston Dynamics Spot robot
Python 3.8+
Boston Dynamics SDK
API keys for OpenAI and Picovoice

Installation

Clone the repository:

git clone https://github.com/vocdex/SpottyAI.git
cd spotty

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install the package and dependencies:
```
pip install -e .
```

Set up environment variables:

export OPENAI_API_KEY="your_openai_api_key"
export PICOVOICE_ACCESS_KEY="your_picovoice_access_key"

Map Creation and Navigation

Before using Spotty's navigation features, you'll need to create a map of your environment:

Record a map:
```
python /scripts/recording_command_line.py --download-filepath /path/to/save/map ROBOT_IP
```
Follow the command line prompts to start recording, move the robot around the space, and stop recording. This will also record camera images as waypoint snapshots. If you have a pre-recorded map, you can skip this step.
Label waypoints:
```
python /scripts/label_waypoints.py --map-path /path/to/map --label-method clip --prompts kitchen hallway office
```
This uses CLIP to automatically label waypoint locations based on visual context from recorded waypoint snapshot images. You need to provide a list of locations to label based on your environment.
Create a RAG database:
```
python scripts/label_with_rag.py --map-path /path/to/map --vector-db-path /path/to/database --maybe-label
```
This will create a vector database for efficient waypoint search and retrieval. It will detect visible objects in the waypoint snapshots and generate a short description of the scene using GPT-4o-mini.

View the map:

python scripts/view_map.py --map-path /path/to/map

Visualize the map with waypoint snapshot information:

python scripts/visualize_map.py --map-path /path/to/map  --rag-path /path/to/database

Running Spotty

Run the main interface:

python main_interface.py

📚 Usage Examples

Basic Voice Interaction

Say "Hey Spot" to activate the wake word detection
Ask a question or give a command:
- "What do you see around you?"
- "Go to the kitchen"
- "Find a chair"
- "Tell me about this room"

Navigation Commands

Go to a labeled location: "Take me to the kitchen"
Find an object: "Can you find the coffee machine?"
Return to base: "Go back to your charging station"
Stand/Sit: "Stand up" or "Sit down"

Visual Queries

Scene description: "What can you see?"
Object identification: "What objects are on the table?"
Environment questions: "Is there anyone in the hallway?"
Spatial questions: "How many chairs are in this room?"

🔧 Advanced Configuration

System Customization

Modify the following files to customize behavior:

spotty/audio/system_prompts.py: Change the assistant's personality and capabilities
spotty/vision/vision_assistant.py: Adjust vision system configuration
spotty/utils/robot_utils.py: Configure robot connection settings

Creating Custom Wake Words

You can create custom wake words using Picovoice Console:

Visit https://console.picovoice.ai/
Create a new wake word
Download the .ppn file
Update the KEYWORD_PATH in spotty/__init__.py

🛠️ Development

Project Structure

spotty/
├── assets/             # Wake words, maps, and vector databases
├── scripts/            # Command-line utilities
├── spotty/             # Main package
│   ├── annotation/     # Waypoint annotation tools
│   ├── audio/          # Audio processing components
│   ├── mapping/        # Navigation components
│   ├── utils/          # Utility functions
│   ├── vision/         # Vision components
│   └── __init__.py     # Package initialization
├── README.md           # This file
└── setup.py            # Package installation

Pre-recorded Maps and Databases

To use pre-recorded maps and databases, download the assets folder from Google Drive and place it in the root directory of the project: Spotty Assets

Adding New Capabilities

To extend Spotty with new features:

Develop your component in the appropriate subdirectory
Integrate it with the UnifiedSpotInterface in main_interface.py
Update prompts and command handlers as needed

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Boston Dynamics for the Spot robot and SDK
OpenAI for GPT-4o-mini, Whisper, and TTS
Picovoice for wake word detection
The open-source community for various libraries and tools

📧 Contact

For questions, suggestions, or collaborations, please open an issue or contact the maintainers directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Spotty: Intelligent Interface for Boston Dynamics Spot Robot

🌟 Features

🗣️ Voice Interaction

🧠 Contextual Understanding

🗺️ Enhanced Navigation

👁️ Visual Intelligence

🏗️ System Architecture

🚀 Getting Started

Prerequisites

Installation

Map Creation and Navigation

Running Spotty

📚 Usage Examples

Basic Voice Interaction

Navigation Commands

Visual Queries

🔧 Advanced Configuration

System Customization

Creating Custom Wake Words

🛠️ Development

Project Structure

Pre-recorded Maps and Databases

Adding New Capabilities

📝 License

🙏 Acknowledgments

📧 Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Spotty: Intelligent Interface for Boston Dynamics Spot Robot

🌟 Features

🗣️ Voice Interaction

🧠 Contextual Understanding

🗺️ Enhanced Navigation

👁️ Visual Intelligence

🏗️ System Architecture

🚀 Getting Started

Prerequisites

Installation

Map Creation and Navigation

Running Spotty

📚 Usage Examples

Basic Voice Interaction

Navigation Commands

Visual Queries

🔧 Advanced Configuration

System Customization

Creating Custom Wake Words

🛠️ Development

Project Structure

Pre-recorded Maps and Databases

Adding New Capabilities

📝 License

🙏 Acknowledgments

📧 Contact