Environment
- Model:
gemini-3.1-flash-live-preview
- Voice:
Leda
- Transport: Raw WebSocket v1beta
wss://generativelanguage.googleapis.com/ws/.../BidiGenerateContent
- Audio pipeline: Twilio MediaStreams (mulaw 8kHz) → PCM16 16kHz → Gemini → PCM16 24kHz → mulaw 8kHz
- Platform: Node.js (
ws library, raw WebSocket, no SDK)
- Scale: ~600 production phone calls over the past 14 days
Bug Description
When sending realtimeInput.text with a script for the model to speak (e.g., a voicemail message), the model sometimes receives the text (WebSocket send() succeeds, no error) but never produces audio output. The model goes completely silent — no serverContent with audio data, no turnComplete, no error.
This is critical for our voicemail delivery workflow. When our system detects a voicemail greeting, we inject a text instruction via realtimeInput.text telling the model to speak a specific voicemail message. Approximately 5% of the time, the model receives this injection but never speaks.
Reproduction Steps
- Establish a Live API session with audio response modality
- Stream audio input (caller audio flowing normally)
- Send
realtimeInput.text with a multi-sentence script (100-200 words)
- Wait for audio output — none arrives
- WebSocket remains open and healthy
- No error messages, no close codes
Impact
Over a 14-day production window:
- 21 voicemail calls had no transcript or transcript under 50 characters (failed VM delivery)
- 401 voicemail calls delivered successfully
- ~5% VM delivery failure rate from this specific issue
- Failed deliveries mean the caller never hears our voicemail — a wasted call and a missed lead contact
Workarounds Attempted
- Nudge timer (4 seconds): After injecting the VM script, we start a 4-second timer. If no audio output arrives, we re-inject the script with stronger language ("You MUST speak now."). This recovers ~50% of frozen deliveries.
- 35-second safety timeout: If Gemini still hasn't spoken after 35 seconds, we hang up and log the failure.
- Multiple nudge attempts: Up to 2 re-injection attempts before giving up.
Expected Behavior
When realtimeInput.text is sent with a script, the model should produce audio output speaking the provided text. If the model cannot process the text for any reason, it should return an error or status signal — not silent failure.
Questions for the Team
- Is there a known issue with
realtimeInput.text being silently dropped?
- Is there a maximum text length that
realtimeInput.text reliably handles?
- Does
realtimeInput.text conflict with ongoing audio input processing? (We continue streaming caller audio while injecting text.)
- Is there a signal we can monitor to confirm the model received and is processing the text injection?
Related Issues
Environment
gemini-3.1-flash-live-previewLedawss://generativelanguage.googleapis.com/ws/.../BidiGenerateContentwslibrary, raw WebSocket, no SDK)Bug Description
When sending
realtimeInput.textwith a script for the model to speak (e.g., a voicemail message), the model sometimes receives the text (WebSocketsend()succeeds, no error) but never produces audio output. The model goes completely silent — noserverContentwith audio data, noturnComplete, no error.This is critical for our voicemail delivery workflow. When our system detects a voicemail greeting, we inject a text instruction via
realtimeInput.texttelling the model to speak a specific voicemail message. Approximately 5% of the time, the model receives this injection but never speaks.Reproduction Steps
realtimeInput.textwith a multi-sentence script (100-200 words)Impact
Over a 14-day production window:
Workarounds Attempted
Expected Behavior
When
realtimeInput.textis sent with a script, the model should produce audio output speaking the provided text. If the model cannot process the text for any reason, it should return an error or status signal — not silent failure.Questions for the Team
realtimeInput.textbeing silently dropped?realtimeInput.textreliably handles?realtimeInput.textconflict with ongoing audio input processing? (We continue streaming caller audio while injecting text.)Related Issues