[BidiGenerateContent] Model freezes after serverContent.interrupted — no audio output after barge-in

## Environment
- **Model:** `gemini-3.1-flash-live-preview`
- **Voice:** `Leda`
- **Transport:** Raw WebSocket v1beta `wss://generativelanguage.googleapis.com/ws/.../BidiGenerateContent`
- **Audio pipeline:** Twilio MediaStreams (mulaw 8kHz) → PCM16 16kHz → Gemini → PCM16 24kHz → mulaw 8kHz
- **Platform:** Node.js (`ws` library, raw WebSocket, no SDK)

## Bug Description

When the caller interrupts the model (barge-in) and `serverContent.interrupted` fires, the model sometimes fails to resume generating audio after the interruption. The caller finishes speaking and waits for a response, but the model produces no audio output. The WebSocket remains open, no error is returned.

Normal barge-in behavior:
1. Model is speaking
2. Caller talks over the model → `serverContent.interrupted` fires
3. Model stops speaking (correct)
4. Caller finishes their turn
5. Model should respond with audio → **but sometimes it never does**

The model enters a state where it has acknowledged the interruption (stopped its own audio) but never starts generating a new response. The caller is left in silence.

## Impact

This is particularly damaging on phone calls because:
- The caller spoke (they're engaged and waiting for a response)
- Silence after someone speaks is unnatural and causes callers to hang up
- These are live sales conversations — losing them means losing business

## Workarounds Implemented

1. **Post-interruption nudge timer (4 seconds):** After every `serverContent.interrupted` event, we start a 4-second timer. If the model hasn't produced audio within 4 seconds, we inject `realtimeInput.text: "Your previous response was interrupted. Respond now with a SHORT reply."` This recovers the model in some cases.
2. **Dead call watchdog (10 seconds):** If mutual silence persists for 10 seconds (including after nudge attempts), we kill the call.

## Expected Behavior

After `serverContent.interrupted` fires and the caller finishes speaking, the model should process the caller's speech and generate an audio response — the same as it would for any turn transition. Interruption should not put the model into a permanently silent state.

## Questions for the Team

1. Is there a known state machine issue where `interrupted` prevents the model from starting a new generation?
2. Does `activityHandling: 'START_OF_ACTIVITY_INTERRUPTS'` interact poorly with certain conversation patterns?
3. Is there a recommended way to "reset" the model's turn state after an interruption?
4. Would switching to manual VAD (`activityHandling: 'NO_INTERRUPTIONS'`) during model output and back to auto afterward help avoid this?

## Related Issues
- google-gemini/cookbook#1225 (audio output freeze — may be same root cause)
- google-gemini/cookbook#1197 (our previous report — 13 issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BidiGenerateContent] Model freezes after serverContent.interrupted — no audio output after barge-in #1228

Environment

Bug Description

Impact

Workarounds Implemented

Expected Behavior

Questions for the Team

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BidiGenerateContent] Model freezes after serverContent.interrupted — no audio output after barge-in #1228

Description

Environment

Bug Description

Impact

Workarounds Implemented

Expected Behavior

Questions for the Team

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions