Skip to content

Sanjan611/rehearse-ai

Repository files navigation

Rehearse AI

🏆 Hackathon Project: Built for the Gemini x Pipecat Virtual Hackathon: Build Adaptive Agents with Real-Time Intelligence

Practice real-life conversations with AI and get constructive feedback on your communication skills.

Overview

Rehearse AI is a web application where users can practice conversations with AI personas simulating real people (recruiters, managers, dates, customers, etc.). After each conversation, the AI provides detailed feedback on clarity, empathy, conciseness, filler words, and suggests alternative phrasings.

Note: Currently only the web application is functional. The mobile app is a work in progress - see apps/mobile/README.md for details.

Architecture

flowchart TB
    User[User Browser] --> WebApp[React Web App]
    WebApp --> Daily[Daily.co WebRTC]
    Daily <--> Bot[Pipecat Bot]
    Bot --> STT[Speechmatics/Deepgram STT]
    Bot --> LLM[OpenAI/Gemini LLM]
    Bot --> TTS[OpenAI/Gemini TTS]
    Bot -.Optional.-> Tavus[Tavus Video Avatar]
    Bot -.Optional.-> Mem0[Mem0 Memory]
Loading

Flow: User speaks → Daily.co → Pipecat Bot → STT → LLM → TTS → Audio Response → User

Pipecat Pipeline

The Pipecat bot processes conversations through a real-time pipeline:

  1. Audio Input → Daily.co receives audio from the user's browser via WebRTC
  2. Speech-to-Text (STT) → Speechmatics or Deepgram transcribes audio to text
    • Detects filler words ("um", "uh", "like") for feedback analysis
    • Provides real-time transcription with low latency
  3. Language Model (LLM) → OpenAI GPT-4o-mini or Google Gemini generates responses
    • Maintains conversation context and persona characteristics
    • Accesses Mem0 memory (optional) for personalized feedback
    • Adapts to the scenario (job interview, customer interaction, etc.)
  4. Text-to-Speech (TTS) → OpenAI or Google Gemini converts response to natural speech
    • Tavus video avatar (optional) provides synchronized video
    • Low-latency streaming for natural conversation flow
  5. Audio Output → Daily.co streams audio (and video) back to user's browser

The pipeline runs continuously during the conversation, enabling natural back-and-forth dialogue with minimal delay.

Features

  • 🎙️ Voice Input/Output: Natural speech-to-text and text-to-speech conversation flow
  • 🎭 Multiple Personas: Practice with different characters (recruiter, manager, date, customer)
  • 📊 Detailed Feedback: Get constructive analysis on tone, content, and areas for improvement
  • 📱 Responsive Design: Clean, minimal UI that works well on desktop and mobile browsers
  • 💾 Session History: Review your last 3 conversation sessions
  • 🔄 Try Again: Retry scenarios to improve your approach
  • 🎥 Video Avatars (Optional): Realistic video avatars via Tavus
  • 🧠 AI Memory (Optional): Personalized feedback tracking via Mem0

Tech Stack

Frontend: React 18, TypeScript, Vite, TailwindCSS Backend: Daily.co (WebRTC), Pipecat (Python Bot) AI Services: OpenAI GPT-4o-mini or Google Gemini, Speechmatics/Deepgram (STT), OpenAI/Gemini (TTS) Optional: Tavus (video avatars), Mem0 (AI memory)

Quick Start

  1. Clone and Install:

    git clone <your-repo-url>
    cd rehearse-ai
    npm install
  2. Configure Environment Variables:

    cp apps/web/env.example .env
    cd pipecat-agent && cp .env.example .env

    Edit both .env files with your API keys. See docs/API_KEYS.md for details.

  3. Install Backend Dependencies:

    cd pipecat-agent
    uv sync
  4. Start Backend and Frontend:

    # Terminal 1: Start backend
    cd pipecat-agent && python local_api.py
    
    # Terminal 2: Start frontend
    npm run dev --workspace=web
  5. Open Browser: Navigate to http://localhost:5173

For detailed setup instructions, see SETUP.md.

Documentation

Project Structure

rehearse-ai/
├── apps/
│   ├── web/                 # React + Vite web application
│   └── mobile/              # React Native app (Work in Progress)
├── packages/
│   └── shared/              # Shared business logic and types
├── pipecat-agent/           # Python bot backend
├── supabase/                # Database and edge functions
└── docs/                    # Documentation

Usage

  1. Start a Session: Enter who you want to talk to and the scenario
  2. Conversation: Use the microphone button to speak - the AI will respond with voice
  3. End Session: Stop the conversation when ready
  4. Get Feedback: Review AI-generated feedback on your performance
  5. Try Again: Retry the same scenario or start a new one

Building for Production

npm run build:web
# Output: apps/web/dist/

Note: Production deployment is not yet implemented. The application currently runs in local development mode only.

Future Enhancements

  • PWA support for mobile installation
  • Additional personas and scenarios
  • Advanced feedback metrics and scoring
  • Export conversation transcripts
  • Support for multiple languages
  • React Native mobile app version

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published