What is the current behavior?
Using Aura via REST or Websocket, I ask it to generate the phrase "The word 'Clinic' is spelled C-L-I-N-I-C."
No matter what voice I use, no matter whether I'm on rest or on websocket (and using Flush at the correct time on websocket) the audio consistently gets cut off after 2-4 letters. I notice this occasionally to a less-bad extent on short sentences like "Got it" or "Thank you" also, but that's inconsistent, whereas spelling anything happens every time.
I've noticed this issue both on my laptop and in our production environment. We're using the deepgram sdk. I've also noticed it both in direct audio TTS and in the voice-to-voice (we're using them in two different projects).
I created a test file exclusively to test Aura TTS and make sure it's not a me issue. Steps to reproduce follow that test file.
Steps to reproduce
I've got these options:
const TEXT = 'The word clinic is spelled C-L-I-N-I-C.';
const MODEL = 'aura-2-andromeda-en'; // I have also tried with 4 other en voices
const SAMPLE_RATE = 24000;
const REST_OPTIONS = {
model: MODEL,
encoding: 'linear16' as const,
sample_rate: SAMPLE_RATE,
container: 'wav' as const,
};
const WS_OPTIONS = {
model: MODEL,
encoding: 'linear16' as const,
sample_rate: SAMPLE_RATE,
container: 'none' as const,
};
This for generating/consuming rest (with the addition of a bunch of console logging lines and file saving):
const response = await deepgram.speak.request({ text: TEXT }, REST_OPTIONS);
const stream = await response.getStream();
if (!stream) throw new Error('No stream in REST response');
const reader = stream.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
And this for websocket (again, with logging stuff and file saving removed for brevity):
return new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
const connection = deepgram.speak.live(WS_OPTIONS);
connection.on(LiveTTSEvents.Open, () => {
connection.sendText(TEXT);
connection.flush();
});
connection.on(LiveTTSEvents.Audio, (data: ArrayBuffer) => {
chunks.push(Buffer.from(data));
});
connection.on(LiveTTSEvents.Flushed, () => {
connection.requestClose?.();
});
connection.on(LiveTTSEvents.Close, () => {
// logging
resolve();
});
});
Expected behavior
I should get an audio that doesn't cut off. But I get audio that cuts off, both from rest and websocket.
Please tell us about your environment
- Operating System/Version: Mac OS Tahoe 26.4
- Language: TypeScript
- Browser: Chrome -- though I'm not accessing through this, I'm using the sdk
Other information
Here are a set of request ids for some of my test runs:
REST:
- 019d78c1-75c1-7283-9607-4f0147059307
- 019d78c1-8346-71c2-87cf-a2c135573681
- 019d78c1-911b-7a50-94fa-b2a0be9198bf
- 019d78c1-9e91-7991-8818-0b0c931afec4
- 019d78c1-aaf5-7e81-9adc-3fc9cb88a4cc
Websocket:
- 019d78c1-7c8b-7aa2-8ae5-f6bb4d3afeda
- 019d78c1-8a1c-7171-a9e3-470f1d25a5ad
- 019d78c1-978b-7a70-8e3b-a8cb24494bab
- 019d78c1-a5c6-7b21-9de2-a77a6219ad44
- 019d78c1-b106-7742-a8c6-05007e4c03a9
What is the current behavior?
Using Aura via REST or Websocket, I ask it to generate the phrase "The word 'Clinic' is spelled C-L-I-N-I-C."
No matter what voice I use, no matter whether I'm on rest or on websocket (and using Flush at the correct time on websocket) the audio consistently gets cut off after 2-4 letters. I notice this occasionally to a less-bad extent on short sentences like "Got it" or "Thank you" also, but that's inconsistent, whereas spelling anything happens every time.
I've noticed this issue both on my laptop and in our production environment. We're using the deepgram sdk. I've also noticed it both in direct audio TTS and in the voice-to-voice (we're using them in two different projects).
I created a test file exclusively to test Aura TTS and make sure it's not a me issue. Steps to reproduce follow that test file.
Steps to reproduce
I've got these options:
This for generating/consuming rest (with the addition of a bunch of console logging lines and file saving):
And this for websocket (again, with logging stuff and file saving removed for brevity):
Expected behavior
I should get an audio that doesn't cut off. But I get audio that cuts off, both from rest and websocket.
Please tell us about your environment
Other information
Here are a set of request ids for some of my test runs:
REST:
Websocket: