Skip to content

Scr4tch587/wisp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

190 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wisp: AI-Powered Voice-First Idea Capture System 🎙️✨

"Capture fleeting thoughts before they vanish"


🎓 About This Project

This repository hosts the source code for Wisp, our final group project for SE 101 in the Software Engineering program at the University of Waterloo.

Originally developed on GitLab, we have ported it to GitHub to share our work with the community. Wisp was designed to eliminate the friction of capturing ideas on mobile devices by leveraging voice-first interaction and advanced AI agents.

🚀 Overview

Wisp is a mobile-first application that enables users to capture, refine, and organize ideas in under 5 seconds using only their voice. Unlike traditional voice recorders, Wisp uses Real-time AI to transcribe speech and intelligently parse it into structured tasks—automatically extracting titles, priorities, deadlines, and context—before syncing them directly to your productivity tools (like Notion).

Key Features

  • ⚡ Instant Voice Capture: Launch and record in under 2 seconds.
  • 🗣️ Real-time Transcription: See your words appear instantly as you speak via WebSocket streaming.
  • 🧠 AI-Powered Parsing: Automatically converts "Remind me to call Mom tomorrow" into a structured task:
    • Title: Call Mom
    • Deadline: Tomorrow, 9:00 AM
    • Priority: Medium
  • 🔄 Multi-Interpretation: If your intent is ambiguous, the AI generates multiple confidence-weighted interpretations for you to choose from.
  • 🔗 Notion Integration: Seamlessly syncs parsed Wisps to your Notion databases using the Model Context Protocol (MCP).
  • 📱 Modern Mobile UI: Built with React Native, Expo, and NativeWind for a polished iOS/Android experience.

📸 Screenshots

Voice Capture Review & Refine

🛠️ Tech Stack

Mobile Client

  • Framework: React Native (Expo SDK 52)
  • Styling: NativeWind (Tailwind CSS)
  • Audio: Expo Audio / WebSocket Streaming
  • State: React Context / Hooks

Backend API

  • Framework: FastAPI (Python 3.11+)
  • AI Service: OpenAI Realtime API / Google Gemini Live
  • Database: Supabase (PostgreSQL)
  • Integration: Model Context Protocol (MCP) SDK

🏗️ Architecture

Wisp follows a modern service-oriented architecture:

  1. Mobile Client: Handles audio recording, chunking, and real-time UI updates.
  2. Backend Gateway: A FastAPI server that orchestrates WebSocket streams and manages state.
  3. AI Service Layer: Processes raw audio into transcripts and structured JSON objects.
  4. MCP Layer: Acts as a bridge to external productivity platforms (Notion, Jira, etc.).

For a deep dive into the system design, check out our Domain Model and Architecture Docs.

🏃‍♂️ Getting Started

To run Wisp locally, please follow the detailed Developer Setup Guide.

Quick Summary

1. Backend Setup

cd Project/src/backend
cp .env.example .env
# Add your OPENAI_API_KEY and SUPABASE credentials
docker-compose up -d

2. Frontend Setup

cd Project/src/frontend
bun install
bun start
# Scan the QR code with Expo Go on your phone

📄 Documentation

We have included extensive documentation from our development process in the Project/docs/ directory:

👥 The Team

University of Waterloo - SE 101 - Team 2


About

SE 101 Project

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors 6