You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reference.md
+36Lines changed: 36 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -120,11 +120,20 @@ This setting affects how the `snippets` array is structured in the response, whi
120
120
<dl>
121
121
<dd>
122
122
123
+
**strip_headers:**`typing.Optional[bool]` — If enabled, the audio for all the chunks of a generation, once concatenated together, will constitute a single audio file. Otherwise, if disabled, each chunk's audio will be its own audio file, each with its own headers (if applicable).
124
+
125
+
</dd>
126
+
</dl>
127
+
128
+
<dl>
129
+
<dd>
130
+
123
131
**instant_mode:**`typing.Optional[bool]`
124
132
125
133
Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
126
134
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
127
135
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
136
+
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
128
137
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.
129
138
130
139
</dd>
@@ -260,11 +269,20 @@ This setting affects how the `snippets` array is structured in the response, whi
260
269
<dl>
261
270
<dd>
262
271
272
+
**strip_headers:**`typing.Optional[bool]` — If enabled, the audio for all the chunks of a generation, once concatenated together, will constitute a single audio file. Otherwise, if disabled, each chunk's audio will be its own audio file, each with its own headers (if applicable).
273
+
274
+
</dd>
275
+
</dl>
276
+
277
+
<dl>
278
+
<dd>
279
+
263
280
**instant_mode:**`typing.Optional[bool]`
264
281
265
282
Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
266
283
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
267
284
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
285
+
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
268
286
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.
269
287
270
288
</dd>
@@ -398,11 +416,20 @@ This setting affects how the `snippets` array is structured in the response, whi
398
416
<dl>
399
417
<dd>
400
418
419
+
**strip_headers:**`typing.Optional[bool]` — If enabled, the audio for all the chunks of a generation, once concatenated together, will constitute a single audio file. Otherwise, if disabled, each chunk's audio will be its own audio file, each with its own headers (if applicable).
420
+
421
+
</dd>
422
+
</dl>
423
+
424
+
<dl>
425
+
<dd>
426
+
401
427
**instant_mode:**`typing.Optional[bool]`
402
428
403
429
Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
404
430
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
405
431
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
432
+
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
406
433
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.
407
434
408
435
</dd>
@@ -544,11 +571,20 @@ This setting affects how the `snippets` array is structured in the response, whi
544
571
<dl>
545
572
<dd>
546
573
574
+
**strip_headers:**`typing.Optional[bool]` — If enabled, the audio for all the chunks of a generation, once concatenated together, will constitute a single audio file. Otherwise, if disabled, each chunk's audio will be its own audio file, each with its own headers (if applicable).
575
+
576
+
</dd>
577
+
</dl>
578
+
579
+
<dl>
580
+
<dd>
581
+
547
582
**instant_mode:**`typing.Optional[bool]`
548
583
549
584
Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on [instant mode](/docs/text-to-speech-tts/overview#ultra-low-latency-streaming-instant-mode).
550
585
- Dynamic voice generation is not supported with this mode; a predefined [voice](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.utterances.voice) must be specified in your request.
551
586
- This mode is only supported for streaming endpoints (e.g., [/v0/tts/stream/json](/reference/text-to-speech-tts/synthesize-json-streaming), [/v0/tts/stream/file](/reference/text-to-speech-tts/synthesize-file-streaming)).
587
+
- Ensure only a single generation is requested ([num_generations](/reference/text-to-speech-tts/synthesize-json-streaming#request.body.num_generations) must be `1` or omitted).
552
588
- With `instant_mode` enabled, **requests incur a 10% higher cost** due to increased compute and resource requirements.
0 commit comments