AI Voice Agent Frontend

This is the frontend application for the AI Voice Agent, built with Next.js. It provides the user interface for the video call, rendering the 3D Avatar (using React Three Fiber) or Video Avatar, and managing the connection to the LiveKit backed agent.

🚀 Tech Stack

Framework: Next.js 16 (App Router)
UI Library: React 19
Styling: Tailwind CSS 4
Real-time Comms: LiveKit Client & LiveKit Components
3D Graphics: React Three Fiber & Drei
Lip Sync: wawa-lipsync for audio-reactive 3D avatar mouth movements.
Animations: Framer Motion

🛠️ Prerequisites

Node.js: Version 18.17 or higher.
Backend: The voice-agent-backend must be running (or valid LiveKit tokens must be generated) for the frontend to connect.

📦 Installation & Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Configure Environment Variables:
- Create a .env.local file in the root of the frontend directory.
- Add the necessary LiveKit public variables (if handling token generation client-side, though typically tokens are fetched from a secure API).
- Example:
```
NEXT_PUBLIC_LIVEKIT_URL=wss://your-project.livekit.cloud
```
Run Development Server:
```
npm run dev
```
Open in Browser: Navigate to http://localhost:3000.

🧩 Key Components

ActiveRoom: The main container for the active call session.
SimpleVoiceChat: Manages the chat interface and interaction state.
Visualizer: Renders the audio frequency visualizer.
AvatarContext: Manages the state of the chosen avatar (3D vs Video).

⚠️ System Limitations & Known Issues

1. Browser Audio Policy

Constraint: Modern browsers (Safari, Chrome) block auto-playing audio without a user gesture.
Impact: Users must interact with the page (click "Start Call" or "Unmute") before they can hear the agent. The UI handles this, but it prevents a purely "hands-free" start.

2. 3D Avatar Lip Sync

Constraint: The 3D avatar uses audio-energy based modulation via wawa-lipsync, not phoneme-based animation.
Impact: The avatar's mouth opens and closes based on volume loudness. It looks responsive but does not form realistic shapes for specific sounds (like 'O', 'M', 'P').

3. Mobile Performance

Constraint: High-fidelity 3D rendering in the browser can be resource-intensive.
Impact: Older mobile devices might experience lower frame rates (FPS) or higher battery drain when rendering the 3D avatar scene.

4. Video Avatar Latency

Constraint: If using the Video Avatar (Beyond Presence), there may be slightly higher initialization time compared to the local 3D model streaming.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
components		components
hooks		hooks
public		public
utils		utils
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Agent Frontend

🚀 Tech Stack

🛠️ Prerequisites

📦 Installation & Setup

🧩 Key Components

⚠️ System Limitations & Known Issues

1. Browser Audio Policy

2. 3D Avatar Lip Sync

3. Mobile Performance

4. Video Avatar Latency

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Voice Agent Frontend

🚀 Tech Stack

🛠️ Prerequisites

📦 Installation & Setup

🧩 Key Components

⚠️ System Limitations & Known Issues

1. Browser Audio Policy

2. 3D Avatar Lip Sync

3. Mobile Performance

4. Video Avatar Latency

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages