This is the frontend application for the AI Voice Agent, built with Next.js. It provides the user interface for the video call, rendering the 3D Avatar (using React Three Fiber) or Video Avatar, and managing the connection to the LiveKit backed agent.
- Framework: Next.js 16 (App Router)
- UI Library: React 19
- Styling: Tailwind CSS 4
- Real-time Comms: LiveKit Client & LiveKit Components
- 3D Graphics: React Three Fiber & Drei
- Lip Sync: wawa-lipsync for audio-reactive 3D avatar mouth movements.
- Animations: Framer Motion
- Node.js: Version 18.17 or higher.
- Backend: The
voice-agent-backendmust be running (or valid LiveKit tokens must be generated) for the frontend to connect.
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install
-
Configure Environment Variables:
- Create a
.env.localfile in the root of thefrontenddirectory. - Add the necessary LiveKit public variables (if handling token generation client-side, though typically tokens are fetched from a secure API).
- Example:
NEXT_PUBLIC_LIVEKIT_URL=wss://your-project.livekit.cloud
- Create a
-
Run Development Server:
npm run dev
-
Open in Browser: Navigate to http://localhost:3000.
ActiveRoom: The main container for the active call session.SimpleVoiceChat: Manages the chat interface and interaction state.Visualizer: Renders the audio frequency visualizer.AvatarContext: Manages the state of the chosen avatar (3D vs Video).
- Constraint: Modern browsers (Safari, Chrome) block auto-playing audio without a user gesture.
- Impact: Users must interact with the page (click "Start Call" or "Unmute") before they can hear the agent. The UI handles this, but it prevents a purely "hands-free" start.
- Constraint: The 3D avatar uses audio-energy based modulation via
wawa-lipsync, not phoneme-based animation. - Impact: The avatar's mouth opens and closes based on volume loudness. It looks responsive but does not form realistic shapes for specific sounds (like 'O', 'M', 'P').
- Constraint: High-fidelity 3D rendering in the browser can be resource-intensive.
- Impact: Older mobile devices might experience lower frame rates (FPS) or higher battery drain when rendering the 3D avatar scene.
- Constraint: If using the Video Avatar (Beyond Presence), there may be slightly higher initialization time compared to the local 3D model streaming.