feat: Add Gemini Realtime provider implementing IRealtimeClient/IRealtimeClientSession by tarekgh · Pull Request #256 · googleapis/dotnet-genai

tarekgh · 2026-03-19T00:18:42Z

Summary

Adds a Gemini Live API provider implementing the Microsoft.Extensions.AI Realtime abstractions (IRealtimeClient / IRealtimeClientSession), enabling real-time audio, text, and function-calling conversations with Gemini models through the standardized MEAI interface.

This PR also updates the repository to depend on the official Microsoft.Extensions.AI.Abstractions 10.4.1 NuGet package (replacing the private 10.5.0-dev builds).

What's Included

New Files

GoogleGenAIRealtimeClient.cs — IRealtimeClient implementation that wraps a Google.GenAI.Client and creates realtime sessions via the Gemini Live API.
GoogleGenAIRealtimeSession.cs — IRealtimeClientSession implementation that manages the WebSocket connection, audio buffering, message mapping, and function call orchestration.
GoogleGenAIRealtimeTest.cs — 118 unit tests covering the full surface area.

Modified Files

GoogleGenAIExtensions.cs — Added AsIRealtimeClient() extension method.
Directory.Packages.props — Updated Microsoft.Extensions.AI.Abstractions from 10.5.0-dev → 10.4.1.
Live.cs — Minor adjustment to expose AsyncSession for the realtime provider.
All packages.lock.json files regenerated.

Features

✅ Audio streaming — Append/commit pattern with automatic frame splitting (32KB max), ActivityStart/ActivityEnd framing
✅ Voice Activity Detection (VAD) — Configurable server-side VAD or manual client-controlled boundaries
✅ Text conversations — Send text messages and receive text/audio responses
✅ Function calling — Full tool invocation support with the FunctionInvokingRealtimeSession middleware; tool responses are batched into a single SendToolResponseAsync call
✅ Transcription — Input and output audio transcription
✅ Thread-safe sends — SemaphoreSlim serializes all WebSocket sends, safe for concurrent middleware + caller usage
✅ Graceful disposal — Race-safe dispose with proper exception handling for in-flight operations

Usage Example

using Google.GenAI;
using Microsoft.Extensions.AI;

// Create the Gemini client and wrap it as an IRealtimeClient
var geminiClient = new Client(apiKey: "YOUR_API_KEY");
IRealtimeClient realtimeClient = new GoogleGenAIRealtimeClient(
    geminiClient, "gemini-2.5-flash-native-audio-preview-12-2025");

// Define a tool for function calling
AIFunction getWeather = AIFunctionFactory.Create(
    (string location) => location switch
    {
        "Seattle"       => $"The weather in {location} is rainy, 55°F",
        "New York"      => $"The weather in {location} is cloudy, 70°F",
        "San Francisco" => $"The weather in {location} is foggy, 60°F",
        _               => $"Sorry, I don't have weather data for {location}."
    },
    "GetWeather",
    "Gets the current weather for a given location");

// Wrap with middleware (function invocation, logging, OpenTelemetry)
var wrappedClient = new RealtimeClientBuilder(realtimeClient)
    .UseFunctionInvocation(configure: session =>
    {
        session.AdditionalTools = [getWeather];
        session.MaximumIterationsPerRequest = 10;
    })
    .UseLogging()
    .Build(serviceProvider);

// Configure session options
var sessionOptions = new RealtimeSessionOptions
{
    OutputModalities = ["audio"],
    Instructions = "You are a helpful assistant.",
    Voice = "Puck",
    TranscriptionOptions = new TranscriptionOptions(),
    Tools = [getWeather],
    VoiceActivityDetection = new VoiceActivityDetectionOptions
    {
        Enabled = true,
        AllowInterruption = true,
    },
};

// Create a session and start streaming
await using var session = await wrappedClient.CreateSessionAsync(sessionOptions);

// Start listening for server messages in the background
_ = Task.Run(async () =>
{
    await foreach (var message in session.GetStreamingResponseAsync(cancellationToken))
    {
        switch (message)
        {
            case OutputTextAudioRealtimeServerMessage audio
                when audio.Type == RealtimeServerMessageType.OutputAudioDelta:
                PlayAudio(audio.Audio);
                break;

            case OutputTextAudioRealtimeServerMessage text
                when text.Type == RealtimeServerMessageType.OutputTextDelta:
                Console.Write(text.Text);
                break;

            case InputAudioTranscriptionRealtimeServerMessage transcription:
                Console.WriteLine($"You said: {transcription.Transcription}");
                break;
        }
    }
});

// Send a text message (function calls are handled automatically by middleware)
var item = new RealtimeConversationItem(
    [new TextContent("What is the weather in New York?")],
    role: ChatRole.User);
await session.SendAsync(new CreateConversationItemRealtimeClientMessage(item: item));
await session.SendAsync(new CreateResponseRealtimeClientMessage());

Key Design Decisions

Tool response batching — The MEAI FunctionInvokingRealtimeSession middleware sends separate CreateConversationItem per function result. Gemini expects all results in one SendToolResponseAsync call. The provider buffers results and flushes them as a single batch when CreateResponse arrives.
TurnComplete suppression after tool responses — After SendToolResponseAsync, Gemini automatically continues generating. Sending client_content with turn_complete: true causes the server to close the WebSocket. The provider tracks this via _lastSendWasToolResponse and skips TurnComplete accordingly.
VAD handling — When VAD is disabled (default), the provider wraps audio commits with explicit ActivityStart/ActivityEnd framing. When enabled, the server handles speech boundary detection automatically.
Audio buffer cap — Audio appends are capped at 10 MB to prevent unbounded memory growth. Frames exceeding 32 KB are automatically split.

Test Coverage

118 unit tests covering:

Client and session lifecycle (construction, disposal, idempotent dispose)
All message types (audio, text, function calls, transcription, errors)
Edge cases (null args, empty buffers, concurrent dispose, exception swallowing)
Function call flow (single/multiple results, batching, flag reset after tool cycle)
VAD modes (enabled, disabled, default)
BuildLiveConnectConfig mapping (all option combinations)

google-cla · 2026-03-19T00:18:49Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

tarekgh · 2026-03-19T01:30:02Z

CC @stephentoub

Google.GenAI/GoogleGenAIExtensions.cs

…timeClientSession

- Use SendRealtimeInputAsync for all input types (text, image, audio) to avoid interleaving with SendClientContentAsync which causes WebSocket close - Fix VAD handling: use ActivityStart/ActivityEnd framing when VAD is disabled, AudioStreamEnd when VAD is enabled for push-to-talk - Fix image input: send as Video blob without activity framing, use minimal text trigger in CreateResponse since Gemini treats images as streaming context - Fix function calling: convert MEAI JsonSchema to Google Schema type with proper uppercase type names (STRING, OBJECT, etc.) - Text input auto-triggers model response without framing

tarekgh force-pushed the feature/gemini-realtime-provider branch 3 times, most recently from 32fa581 to 1d54288 Compare March 19, 2026 01:27

jeffhandley reviewed Mar 19, 2026

View reviewed changes

Google.GenAI/GoogleGenAIExtensions.cs Outdated Show resolved Hide resolved

tarekgh force-pushed the feature/gemini-realtime-provider branch from 1d54288 to a5345ce Compare March 19, 2026 21:58

shivvaam0001 self-assigned this Mar 20, 2026

feat: Add Gemini Realtime provider implementing IRealtimeClient/IReal…

dc649bd

…timeClientSession

tarekgh force-pushed the feature/gemini-realtime-provider branch from a5345ce to dd1b649 Compare March 26, 2026 23:07

tarekgh force-pushed the feature/gemini-realtime-provider branch from dd1b649 to 6121fcf Compare March 26, 2026 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Gemini Realtime provider implementing IRealtimeClient/IRealtimeClientSession#256

feat: Add Gemini Realtime provider implementing IRealtimeClient/IRealtimeClientSession#256
tarekgh wants to merge 2 commits intogoogleapis:mainfrom
tarekgh:feature/gemini-realtime-provider

tarekgh commented Mar 19, 2026

Uh oh!

google-cla bot commented Mar 19, 2026

Uh oh!

tarekgh commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants