Models Affected: gemini-2.5-flash-native-audio-preview-12-2025 and gemini-3.1-flash-live-preview
Language: Turkish (tr-TR)
Description:
I am experiencing two distinct issues across the Gemini Live API native audio models while generating Turkish audio:
1. Speaker Drift / Robotic Voice (gemini-2.5-flash-native-audio-preview-12-2025):
The initial audio quality in Turkish is highly natural and accurate. However, after generating about 3-4 sentences, the voice suffers from severe speaker drift and turns into a mechanical, robotic, and raspy tone.
2. Phonetic Errors and Word Skipping (gemini-3.1-flash-live-preview):
While the new gemini-3.1-flash-live-preview model successfully fixes the robotic degradation and maintains a stable persona over time , it introduces new phonetic bugs in Turkish. Specifically:
It mispronounces common Turkish names (e.g., it reads the name "Emre" as "Emir").
It occasionally skips words entirely in the audio output. For example, when reading "Emre Bey", it drops the name completely and only pronounces the word "Bey".
Expected Behavior:
The 12-2025 model should maintain its natural Turkish voice profile continuously without degrading into a synthetic baseline state.
The 3.1-preview model should correctly pronounce Turkish names without hallucinating different vowels/consonants and must not drop/skip words during audio generation.
Steps to Reproduce:
Start a Live API session in Turkish.
Prompt gemini-2.5-flash-native-audio-preview-12-2025 to speak for more than 3-4 sentences to observe the robotic degradation.
Switch the model to gemini-3.1-flash-live-preview and prompt it to read sentences containing names like "Emre" or "Emre Bey" to observe the mispronunciation and word-skipping behavior.
Are there any known client-side workarounds for the word-skipping and pronunciation bugs in the 3.1 preview, or a timeline for a fix?
Models Affected: gemini-2.5-flash-native-audio-preview-12-2025 and gemini-3.1-flash-live-preview
Language: Turkish (tr-TR)
Description:
I am experiencing two distinct issues across the Gemini Live API native audio models while generating Turkish audio:
1. Speaker Drift / Robotic Voice (gemini-2.5-flash-native-audio-preview-12-2025):
The initial audio quality in Turkish is highly natural and accurate. However, after generating about 3-4 sentences, the voice suffers from severe speaker drift and turns into a mechanical, robotic, and raspy tone.
2. Phonetic Errors and Word Skipping (gemini-3.1-flash-live-preview):
While the new gemini-3.1-flash-live-preview model successfully fixes the robotic degradation and maintains a stable persona over time , it introduces new phonetic bugs in Turkish. Specifically:
It mispronounces common Turkish names (e.g., it reads the name "Emre" as "Emir").
It occasionally skips words entirely in the audio output. For example, when reading "Emre Bey", it drops the name completely and only pronounces the word "Bey".
Expected Behavior:
The 12-2025 model should maintain its natural Turkish voice profile continuously without degrading into a synthetic baseline state.
The 3.1-preview model should correctly pronounce Turkish names without hallucinating different vowels/consonants and must not drop/skip words during audio generation.
Steps to Reproduce:
Start a Live API session in Turkish.
Prompt gemini-2.5-flash-native-audio-preview-12-2025 to speak for more than 3-4 sentences to observe the robotic degradation.
Switch the model to gemini-3.1-flash-live-preview and prompt it to read sentences containing names like "Emre" or "Emre Bey" to observe the mispronunciation and word-skipping behavior.
Are there any known client-side workarounds for the word-skipping and pronunciation bugs in the 3.1 preview, or a timeline for a fix?