Skip to content

michel-adelino/openai-realtime-avatar

Repository files navigation

Real-Time Avatar Interaction (Next.js)

Welcome to the Real-Time Avatar Interaction project! This application creates an interactive experience between the user and a real-time avatar using HeyGen's advanced avatar technology and OpenAI's language model. The project enables users to converse with an avatar in real-time, converting speech to text and generating responses through OpenAI's GPT models.

Application Screenshot

Features

  • Real-Time Avatar Interaction: Engage in live, face-to-face conversations with a virtual avatar
  • Multiple Input Methods:
    • Voice Input: Speak naturally - the app converts your speech to text using OpenAI Whisper
    • Text Chat: Type messages directly in the chat panel
    • Quick Prompts: Use pre-defined badges for instant conversation starters
  • Multi-Language Support: Choose from 12 languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Arabic, Russian, and Hindi
  • Natural Conversations: AI responds in a casual, conversational style with natural filler words and expressions
  • Modern UI:
    • Full-screen video display for immersive experience
    • Glassmorphism design with backdrop blur effects
    • Full-width bottom control bar
    • Responsive chat panel with scrollable message history
    • Beautiful landing page with feature showcase

Technologies Used

  • Next.js 14: React framework with App Router
  • React 18: For building the user interface
  • TypeScript: For type safety and better developer experience
  • Tailwind CSS: For modern, responsive styling
  • Shadcn UI: For beautiful, accessible UI components
  • OpenAI API:
    • GPT-4.1-mini for natural language generation
    • Whisper for speech-to-text transcription
  • HeyGen Streaming Avatar API: For real-time avatar video streaming
  • Radix UI: For accessible component primitives

Getting Started

Prerequisites

  • Node.js 18+ and npm
  • OpenAI API key
  • HeyGen API key, Avatar ID, and Voice ID

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd avatar
  2. Install the dependencies:

    npm install
  3. Set up environment variables:

    Create a .env.local file in the root directory with the following variables:

    NEXT_PUBLIC_OPENAI_API_KEY=your_openai_api_key_here
    NEXT_PUBLIC_HEYGEN_API_KEY=your_heygen_api_key_here
    NEXT_PUBLIC_HEYGEN_AVATARID=your_avatar_id_here
    NEXT_PUBLIC_HEYGEN_VOICEID=your_voice_id_here

    Important:

    • Make sure there are no quotes around the values
    • No spaces around the = sign
    • Restart the dev server after changing environment variables
  4. Start the development server:

    npm run dev
  5. Open your browser:

    Navigate to http://localhost:3000 to see the application in action.

Usage

  1. Start a Conversation:

    • Click "Start Conversation" on the landing page
    • Wait for the avatar to initialize
  2. Interact with the Avatar:

    • Voice: Click the microphone button and speak naturally
    • Text: Open the chat panel (bottom right) and type your message
    • Quick Prompts: Click on badge prompts in the chat panel
  3. Change Language:

    • Use the language selector in the top-left corner (video view) or chat panel header
    • Select from 12 available languages
    • The avatar will respond in the selected language
  4. End Conversation:

    • Click the red phone button to stop the avatar

Building for Production

To build the application for production:

npm run build
npm start

The production build will be optimized and ready for deployment.

Project Structure

src/
├── app/
│   ├── layout.tsx          # Root layout with metadata
│   ├── page.tsx            # Main page component with avatar logic
│   └── globals.css         # Global styles and Tailwind directives
├── components/
│   ├── reusable/
│   │   ├── Badges.tsx      # Quick prompt badges
│   │   ├── ChatMessage.tsx # Individual chat message component
│   │   ├── LandingComponent.tsx # Landing page component
│   │   ├── LanguageSelector.tsx # Language selection dropdown
│   │   ├── MicButton.tsx   # Microphone and phone controls
│   │   └── Video.tsx       # Video player component
│   └── ui/                 # Shadcn UI components
│       ├── avatar.tsx
│       ├── badge.tsx
│       ├── button.tsx
│       ├── card.tsx
│       ├── dropdown-menu.tsx
│       ├── toast.tsx
│       └── toaster.tsx
├── hooks/
│   └── use-toast.ts        # Toast notification hook
├── lib/
│   └── utils.ts            # Utility functions (cn helper)
└── services/
    └── api.ts               # HeyGen API service

Features in Detail

Language Selection

  • Supports 12 languages with native-level responses
  • System prompts adapt to each language for natural conversation
  • Language selector available in both video view and chat panel

Chat Interface

  • Scrollable message history
  • Large message area to minimize scrolling
  • Real-time message updates
  • Quick prompt badges for common topics

Voice Interaction

  • Automatic silence detection (stops recording after 2 seconds of silence)
  • Real-time audio transcription
  • Visual feedback with button state changes

UI/UX

  • Full-screen video for immersive experience
  • Glassmorphism effects with backdrop blur
  • Smooth animations and transitions
  • Responsive design for all screen sizes
  • Modern dark theme with accent colors

Troubleshooting

Common Issues

  1. 400 Bad Request Error:

    • Verify your environment variables are set correctly
    • Check that Avatar ID and Voice ID are valid in your HeyGen account
    • Ensure your API key has proper permissions
  2. Avatar Not Starting:

    • Check browser console for detailed error messages
    • Verify all environment variables are loaded (check console logs)
    • Restart the dev server after changing .env.local
  3. Audio Not Working:

    • Grant microphone permissions in your browser
    • Check browser console for permission errors
    • Try refreshing the page

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is private and proprietary.

Acknowledgments

  • HeyGen for the streaming avatar technology
  • OpenAI for GPT and Whisper APIs
  • Shadcn for the beautiful UI components

Releases

No releases published

Packages

 
 
 

Contributors