Skip to content

Gemini acts as a conversation facilitator and moderator with a panelist to explore and engage in a conversation that includes audio/video..

Notifications You must be signed in to change notification settings

mohan-ganesh/knowledge-synthesizer-studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Live API React Demo that acts as a knowledge synthesizer studio

A React-based client for Google's Gemini Live API, featuring real-time audio/video streaming and a WebSocket proxy for secure authentication.

🚀 Full Guide & Architectural Breakdown: https://www.garvik.dev/ai/gemini-live-api-streaming

Quick Start

1. Backend Setup

Install Python dependencies and start the proxy server:

# Install dependencies
pip install -r requirements.txt

# Authenticate with Google Cloud
gcloud auth application-default login

# Start the proxy server
python server.py

2. Frontend Setup

In a new terminal, start the React application:

Ensure you have Node.js and npm installed. If not, download and install them from nodejs.org.

# Install Node modules
npm install

# Start development server
npm run dev

Open http://localhost:5173 to view the app.

Features

  • Real-time Streaming: Audio and video streaming to Gemini.
  • React Components: Modular UI with LiveAPIDemo.jsx.
  • Secure Proxy: Python backend handles Google Cloud authentication.
  • Custom Tools: Support for defining client-side tools.
  • Media Handling: dedicated audio capture and playback processors.

Project Structure

/
|
|--server-api           # WebSocket proxy & auth handler
|
|--web                  # React application

Core APIs

GeminiLiveAPI

Located in src/utils/gemini-api.js, this class manages the WebSocket connection.

import { GeminiLiveAPI } from "./utils/gemini-api";

const client = new GeminiLiveAPI(
  "ws://localhost:8080",
  "your-project-id",
  "gemini-2.0-flash-exp"
);

client.connect();
client.sendText("Hello Gemini");

Media Integration

The app uses AudioWorklets for low-latency audio processing:

  • capture.worklet.js: Handles microphone input.
  • playback.worklet.js: Handles PCM audio output.

Configuration

  • Model: Defaults to gemini-live-2.5-flash-native-audio
  • Voice: Configurable in LiveAPIDemo.jsx (Puck, Charon, etc.)
  • Proxy Port: Default 8080 (set in server-api/server.py) --env variables Create a .env file in the root directory and add the following variables: VITE_PROXY_URL=web-socket proxy url VITE_PROJECT_ID=your project id GCS_BUCKET_NAME=your gcs bucket name

About the Project

Create a new Virtual Room and join the room using the room id.

About

Gemini acts as a conversation facilitator and moderator with a panelist to explore and engage in a conversation that includes audio/video..

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published