Skip to content

nareshis21/SuperBryn-task-frontend

Repository files navigation

AI Voice Agent Frontend

This is the frontend application for the AI Voice Agent, built with Next.js. It provides the user interface for the video call, rendering the 3D Avatar (using React Three Fiber) or Video Avatar, and managing the connection to the LiveKit backed agent.

🚀 Tech Stack

🛠️ Prerequisites

  • Node.js: Version 18.17 or higher.
  • Backend: The voice-agent-backend must be running (or valid LiveKit tokens must be generated) for the frontend to connect.

📦 Installation & Setup

  1. Navigate to the frontend directory:

    cd frontend
  2. Install dependencies:

    npm install
  3. Configure Environment Variables:

    • Create a .env.local file in the root of the frontend directory.
    • Add the necessary LiveKit public variables (if handling token generation client-side, though typically tokens are fetched from a secure API).
    • Example:
      NEXT_PUBLIC_LIVEKIT_URL=wss://your-project.livekit.cloud
  4. Run Development Server:

    npm run dev
  5. Open in Browser: Navigate to http://localhost:3000.

🧩 Key Components

  • ActiveRoom: The main container for the active call session.
  • SimpleVoiceChat: Manages the chat interface and interaction state.
  • Visualizer: Renders the audio frequency visualizer.
  • AvatarContext: Manages the state of the chosen avatar (3D vs Video).

⚠️ System Limitations & Known Issues

1. Browser Audio Policy

  • Constraint: Modern browsers (Safari, Chrome) block auto-playing audio without a user gesture.
  • Impact: Users must interact with the page (click "Start Call" or "Unmute") before they can hear the agent. The UI handles this, but it prevents a purely "hands-free" start.

2. 3D Avatar Lip Sync

  • Constraint: The 3D avatar uses audio-energy based modulation via wawa-lipsync, not phoneme-based animation.
  • Impact: The avatar's mouth opens and closes based on volume loudness. It looks responsive but does not form realistic shapes for specific sounds (like 'O', 'M', 'P').

3. Mobile Performance

  • Constraint: High-fidelity 3D rendering in the browser can be resource-intensive.
  • Impact: Older mobile devices might experience lower frame rates (FPS) or higher battery drain when rendering the 3D avatar scene.

4. Video Avatar Latency

  • Constraint: If using the Video Avatar (Beyond Presence), there may be slightly higher initialization time compared to the local 3D model streaming.

About

Frontend for a real-time AI Voice Agent built with Next.js, LiveKit, and React. It enables seamless voice conversations with either a 3D audio-reactive avatar (React Three Fiber) or a video avatar, supporting live streaming, lip sync visualization, and interactive UI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages