🦉 ATHENA — AI Glasses for Sight & Sound

ATHENA is a DIY AI-powered smart glasses project built on ESP32-C3, Google Gemini, and Azure Speech Synthesis. It’s designed as a wearable assistant for the elderly and visually impaired — helping them see, hear, and understand the world around them at the press of a button.

🌟 Overview

ATHENA combines edge hardware (ESP32-C3 + camera + speaker) with cloud AI services to deliver real-time scene understanding and natural speech output.

When the user presses the side button:

📷 ESP32-C3 captures an image from the camera.
🌐 The image is sent to a custom backend API.
🧠 Gemini analyzes the scene and produces a clear, simple description.
🗣 Azure Speech Synthesis (Neural TTS) converts the text into lifelike audio.
🔊 The audio is streamed back to the ESP32 and played through a small speaker.

This makes ATHENA a real-time sight-to-speech system for those who need help identifying objects, navigating environments, or receiving quick guidance.

🛠 Tech Stack

Hardware

ESP32-C3 Dev Board
OV2640 Camera Module (or similar, supported by ESP32-CAM drivers)
I²S DAC/Speaker for audio playback
Physical Button (GPIO input trigger)
Li-Po Battery Pack for mobility (with charging circuit)
(Optional) I²S Microphone for future voice command features

Software

ESP32 Arduino Firmware
- Captures images on button press
- Sends images via HTTPS (multipart upload)
- Receives audio file (WAV/MP3) and plays via I²S DAC
Backend (Node.js / TypeScript)
- Endpoint for receiving images
- Google Gemini → Vision understanding (object + scene analysis)
- Azure Speech Service (Neural TTS) → Converts description into natural audio
- Streams WAV/MP3 audio back to device

⚙️ System Architecture

flowchart TD
    A[Button Press] --> B[ESP32-C3 Camera Captures Image]
    B --> C[Send Image to Backend API]
    C --> D[Gemini Vision Model → Scene Description]
    D --> E[Azure Speech Service → Neural Voice Audio]
    E --> F[Backend sends audio file]
    F --> G[ESP32-C3 plays audio via I²S Speaker]

🚀 Getting Started

1. Hardware Setup

Connect camera to ESP32-C3 board.
Wire a momentary push button to GPIO (e.g., GPIO5).
Connect a small I²S speaker/DAC to audio pins.
Power the device using a battery pack or USB.

2. Firmware

Flash the ESP32 firmware via Arduino IDE or PlatformIO.
Configure WiFi credentials (ssid, password) in the code.
Update POST_URL to point to your backend endpoint.

3. Backend

Clone the Node.js/TypeScript backend repo.
Set environment variables:
- GEMINI_API_KEY (Google Generative Vision API)
- AZURE_SPEECH_KEY (Azure Cognitive Services)
- AZURE_SPEECH_REGION (region of your Azure resource, e.g. eastus)
Run server with:
```
npm install
npm run dev
```

4. Usage

Power on ATHENA.
Press the side button.
Glasses will speak out a description of what’s in front of the wearer.

📌 Roadmap

Image capture & upload
Scene description via Gemini
Audio feedback via Azure TTS
Add offline fallback mode (basic object detection on-device)
Support continuous audio streaming
Add microphone for voice commands (e.g., “What’s in front of me?”)
Optimize low-power modes for extended battery life

💡 Use Cases

For visually impaired users: understand surroundings, read aloud objects/signs.
For elderly care: assist in identifying items or locations with a button press.
DIY experimenters: build your own wearable AI assistant.

⚠️ Disclaimer

ATHENA is a proof-of-concept DIY project created for experimentation, learning, and prototyping purposes only. It is not a medical device and should not be relied upon for critical navigation, healthcare, or safety decisions. The hardware and software are provided as-is, and real-world use should be limited to testing, research, and educational exploration.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
backend		backend
firmware		firmware
scrap		scrap
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦉 ATHENA — AI Glasses for Sight & Sound

🌟 Overview

🛠 Tech Stack

Hardware

Software

⚙️ System Architecture

🚀 Getting Started

1. Hardware Setup

2. Firmware

3. Backend

4. Usage

📌 Roadmap

💡 Use Cases

⚠️ Disclaimer

About

Uh oh!

Contributors 3

Uh oh!

Languages

License

inovus-labs/athena

Folders and files

Latest commit

History

Repository files navigation

🦉 ATHENA — AI Glasses for Sight & Sound

🌟 Overview

🛠 Tech Stack

Hardware

Software

⚙️ System Architecture

🚀 Getting Started

1. Hardware Setup

2. Firmware

3. Backend

4. Usage

📌 Roadmap

💡 Use Cases

⚠️ Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages