You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Platform: Node.js (ws library, raw WebSocket, no SDK)
Scale: ~600 production phone calls over the past 14 days
Bug Description
The model randomly stops producing audio output mid-conversation. The WebSocket connection remains open, no error is returned, serverContent messages stop arriving, but the connection does not close. The model simply goes silent.
This is the single biggest issue affecting our production voice agent. It happens on both inbound and outbound calls, at any point in the conversation — during the greeting, mid-sentence, or after processing caller input.
Reproduction
This is non-deterministic and cannot be reliably reproduced. It happens across different callers, different times of day, and different conversation topics. The only consistent pattern is:
WebSocket is open and healthy
Audio input is flowing from the caller (we can see raw audio chunks arriving)
inputTranscription events are still arriving (caller speech is being transcribed)
But serverContent with audio data stops completely
No turnComplete, no generationComplete, no error — just silence
Impact
Over a 14-day production window (603 total calls):
66 calls ended with outcome: unknown — the majority caused by this audio freeze
65 of those were under 15 seconds — the model froze before any meaningful conversation could happen
Our server-side watchdog kills frozen calls after 10 seconds of mutual silence
Real example ([Caller], May 9 2026): Caller phoned in, The AI agent greeted: "Hi, thanks for calling [Company]." Caller responded: "I I I you guys sent me a message because I have some selling my house at this." — then Gemini froze. Call died at 12 seconds. The caller had to call back 4.5 hours later to get through.
Workarounds Attempted
Dead call watchdog — 10-second silence timer kills frozen calls and attempts voicemail delivery. Works but loses the live conversation.
Text nudge injection — When freeze is detected and caller speech was recent, we inject realtimeInput.text telling the model to respond. Works ~30% of the time.
Post-interruption nudge — 4-second timer after serverContent.interrupted events, since freezes often follow interruptions. Nudges the model to resume.
Pre-warm WebSocket — We open the Gemini WebSocket and send the setup message during the Twilio TwiML fetch (before call connects) to eliminate cold-start. This helps with first-turn latency but does not prevent mid-conversation freezes.
None of these fix the root cause. The model simply stops generating audio and no amount of text injection or waiting recovers it reliably.
Is there a known issue with audio generation stalling on gemini-3.1-flash-live-preview?
Does contextWindowCompression: { slidingWindow: {} } affect audio output stability? GitHub issue Automatically add specific labels to PRs #117 in google-gemini/live-api-web-console suggests a correlation.
Is there a recommended recovery mechanism when the model stops producing audio but the WebSocket remains open?
Are there diagnostic signals we should be monitoring that would predict or explain these freezes?
Environment
gemini-3.1-flash-live-previewLedawss://generativelanguage.googleapis.com/ws/.../BidiGenerateContentautomaticActivityDetection—START_SENSITIVITY_HIGH,END_SENSITIVITY_HIGH,prefixPaddingMs: 150,silenceDurationMs: 700wslibrary, raw WebSocket, no SDK)Bug Description
The model randomly stops producing audio output mid-conversation. The WebSocket connection remains open, no error is returned,
serverContentmessages stop arriving, but the connection does not close. The model simply goes silent.This is the single biggest issue affecting our production voice agent. It happens on both inbound and outbound calls, at any point in the conversation — during the greeting, mid-sentence, or after processing caller input.
Reproduction
This is non-deterministic and cannot be reliably reproduced. It happens across different callers, different times of day, and different conversation topics. The only consistent pattern is:
inputTranscriptionevents are still arriving (caller speech is being transcribed)serverContentwith audio data stops completelyturnComplete, nogenerationComplete, no error — just silenceImpact
Over a 14-day production window (603 total calls):
outcome: unknown— the majority caused by this audio freezeWorkarounds Attempted
realtimeInput.texttelling the model to respond. Works ~30% of the time.serverContent.interruptedevents, since freezes often follow interruptions. Nudges the model to resume.None of these fix the root cause. The model simply stops generating audio and no amount of text injection or waiting recovers it reliably.
Configuration
Questions for the Team
gemini-3.1-flash-live-preview?contextWindowCompression: { slidingWindow: {} }affect audio output stability? GitHub issue Automatically add specific labels to PRs #117 ingoogle-gemini/live-api-web-consolesuggests a correlation.Related Issues