Environment
- Model:
gemini-3.1-flash-live-preview
- Voice:
Leda
- Transport: Raw WebSocket v1beta
wss://generativelanguage.googleapis.com/ws/.../BidiGenerateContent
- Audio pipeline: Twilio MediaStreams (mulaw 8kHz) → PCM16 16kHz → Gemini → PCM16 24kHz → mulaw 8kHz
- Platform: Node.js (
ws library, raw WebSocket, no SDK)
Bug Description
When the caller interrupts the model (barge-in) and serverContent.interrupted fires, the model sometimes fails to resume generating audio after the interruption. The caller finishes speaking and waits for a response, but the model produces no audio output. The WebSocket remains open, no error is returned.
Normal barge-in behavior:
- Model is speaking
- Caller talks over the model →
serverContent.interrupted fires
- Model stops speaking (correct)
- Caller finishes their turn
- Model should respond with audio → but sometimes it never does
The model enters a state where it has acknowledged the interruption (stopped its own audio) but never starts generating a new response. The caller is left in silence.
Impact
This is particularly damaging on phone calls because:
- The caller spoke (they're engaged and waiting for a response)
- Silence after someone speaks is unnatural and causes callers to hang up
- These are live sales conversations — losing them means losing business
Workarounds Implemented
- Post-interruption nudge timer (4 seconds): After every
serverContent.interrupted event, we start a 4-second timer. If the model hasn't produced audio within 4 seconds, we inject realtimeInput.text: "Your previous response was interrupted. Respond now with a SHORT reply." This recovers the model in some cases.
- Dead call watchdog (10 seconds): If mutual silence persists for 10 seconds (including after nudge attempts), we kill the call.
Expected Behavior
After serverContent.interrupted fires and the caller finishes speaking, the model should process the caller's speech and generate an audio response — the same as it would for any turn transition. Interruption should not put the model into a permanently silent state.
Questions for the Team
- Is there a known state machine issue where
interrupted prevents the model from starting a new generation?
- Does
activityHandling: 'START_OF_ACTIVITY_INTERRUPTS' interact poorly with certain conversation patterns?
- Is there a recommended way to "reset" the model's turn state after an interruption?
- Would switching to manual VAD (
activityHandling: 'NO_INTERRUPTIONS') during model output and back to auto afterward help avoid this?
Related Issues
Environment
gemini-3.1-flash-live-previewLedawss://generativelanguage.googleapis.com/ws/.../BidiGenerateContentwslibrary, raw WebSocket, no SDK)Bug Description
When the caller interrupts the model (barge-in) and
serverContent.interruptedfires, the model sometimes fails to resume generating audio after the interruption. The caller finishes speaking and waits for a response, but the model produces no audio output. The WebSocket remains open, no error is returned.Normal barge-in behavior:
serverContent.interruptedfiresThe model enters a state where it has acknowledged the interruption (stopped its own audio) but never starts generating a new response. The caller is left in silence.
Impact
This is particularly damaging on phone calls because:
Workarounds Implemented
serverContent.interruptedevent, we start a 4-second timer. If the model hasn't produced audio within 4 seconds, we injectrealtimeInput.text: "Your previous response was interrupted. Respond now with a SHORT reply."This recovers the model in some cases.Expected Behavior
After
serverContent.interruptedfires and the caller finishes speaking, the model should process the caller's speech and generate an audio response — the same as it would for any turn transition. Interruption should not put the model into a permanently silent state.Questions for the Team
interruptedprevents the model from starting a new generation?activityHandling: 'START_OF_ACTIVITY_INTERRUPTS'interact poorly with certain conversation patterns?activityHandling: 'NO_INTERRUPTIONS') during model output and back to auto afterward help avoid this?Related Issues