Previous fix used WAV format, but Gemini documentation states that raw 16-bit PCM at 16kHz is the optimal input format.
Reverted to raw PCM format with the correct MIME type specification.
- Correct format:
audio/pcm;rate=16000 - The
rate=16000parameter is crucial - it tells Gemini the sample rate
- Sending raw PCM data (Int16Array buffer)
- Little-endian byte order (default for JavaScript)
- 16-bit samples
- Mono channel
- 16kHz sample rate
- Removed
createWavBuffer()method (not needed) - Removed
writeString()helper (not needed) - Sending raw PCM is simpler and optimal per Gemini docs
According to official documentation:
- Optimal format: Raw 16-bit PCM, little-endian, mono, 16kHz
- MIME type:
audio/pcm;rate=16000 - Token usage: 32 tokens per second of audio
- Downsampling: Gemini downsamples to 16 Kbps automatically
- Max length: 9.5 hours per prompt
- Using the exact format Gemini recommends
- Proper MIME type with rate parameter
- No unnecessary WAV wrapper overhead
- 2-second chunks for better context
- Reload the extension
- Try dictating
- Should now get accurate transcriptions
The raw PCM format is what Gemini is optimized for!