Skip to content

Commit 51af345

Browse files
authored
update google tts options (#104)
* update google tts options * wip
1 parent 9377471 commit 51af345

File tree

5 files changed

+48
-4
lines changed

5 files changed

+48
-4
lines changed

fern/docs/pages/features/tts-streaming.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Finally, as of release 0.9.3 we support the following TTS vendors for streaming:
2626
- Elevenlabs
2727
- Cartesia
2828
- Rimelabs
29+
- Google
2930

3031
We are adding additional vendors all the time, so check back with us if you are looking for support from a different vendor.
3132

fern/docs/pages/features/using-openai-stt.mdx

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ To begin with, here are the possible options that you use with OpenAI STT:
2828
prompt: 'string',
2929
turn_detection: {
3030
type: 'server_vad', // or 'semantic_vad' or 'none'
31-
prefix_padding_ms: 300,
32-
silence_duration_ms: 800
31+
eagerness: 'medium', // only for semantic_vad: 'low', 'medium', 'high', or 'auto'
32+
prefix_padding_ms: 300, // only for server_vad
33+
silence_duration_ms: 800 // only for server_vad
3334
},
3435
promptTemplates: {
3536
hintsTemplate: 'string',
@@ -39,6 +40,19 @@ To begin with, here are the possible options that you use with OpenAI STT:
3940
}
4041
```
4142

43+
### Turn detection options
44+
45+
The `turn_detection` object controls how OpenAI detects when a speaker has finished talking.
46+
47+
**For `semantic_vad` type:**
48+
- `eagerness`: Controls how eager the model is to determine the end of an utterance. Possible values:
49+
- `auto` (default): Equivalent to `medium`
50+
- `low`: Allows the user more time to speak, resulting in larger transcript chunks
51+
- `medium`: Balanced approach
52+
- `high`: Returns transcription events faster with smaller chunks
53+
54+
The `eagerness` setting affects how audio is chunked even in transcription mode. Use `high` if you want faster transcription events, or `low` if you prefer larger, more complete transcript chunks.
55+
4256
In this article we want to explore the various ways to construct a prompt for OpenAI STT.
4357

4458
## Providing hints

fern/docs/pages/verbs/recognizer.mdx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,6 +1108,14 @@ subtitle: A **property** that can be used in verbs like [`gather`](./gather) and
11081108
<ParamField path="transcription_config.punctuation_overrides.sensitivity" type="number" required={false}>
11091109
</ParamField>
11101110

1111+
<ParamField path="transcription_config.conversation_config" type="object" required={false}>
1112+
Configuration for conversation-based transcription features.
1113+
</ParamField>
1114+
1115+
<ParamField path="transcription_config.conversation_config.end_of_utterance_silence_trigger" type="number" required={false}>
1116+
Duration of silence (in seconds) that triggers an end-of-utterance event. This controls how long the system waits after the speaker stops talking before determining that the utterance is complete. See [Speechmatics turn detection docs](https://docs.speechmatics.com/speech-to-text/realtime/turn-detection#end-of-utterance) for details.
1117+
</ParamField>
1118+
11111119
<ParamField path="sm_audioFilteringConfig" type="object" required={false}>
11121120
Audio filtering configuration.
11131121
</ParamField>

fern/docs/pages/verbs/say.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,15 @@ subtitle: Generate text-to-speech audio.
4545
</ParamField>
4646

4747
<ParamField path="synthesizer.voice" type="string" required={false}>
48-
Voice to use.
49-
Note that the voice list differs depending on whether you are using AWS or Google.
48+
Voice to use.
49+
Note that the voice list differs depending on whether you are using AWS or Google.
5050
Defaults to application setting, if provided.
5151
</ParamField>
5252

53+
<ParamField path="instructions" type="string" required={false}>
54+
A prompt sent to the TTS vendor to guide how the audio should be generated. Use this to specify the desired tone, emotion, speaking style, or context for the synthesized speech. This parameter is only supported by vendors that offer prompt-based TTS generation (e.g., Google Gemini TTS).
55+
</ParamField>
56+
5357
<ParamField path="text" type="string" required={false}>
5458
Text to speak; may contain SSML tags.
5559
</ParamField>

fern/docs/pages/verbs/synthesizer.mdx

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,23 @@ subtitle: A **property** that can be used in a `say` verb to override the applic
3737

3838
<AccordionGroup>
3939

40+
<Accordion title="google">
41+
<ParamField path="model" type="string" required={false}>
42+
The model to use for text-to-speech synthesis. When specified, this enables Gemini TTS. Example: `gemini-2.5-flash-preview-tts`.
43+
</ParamField>
44+
<ParamField path="apiMode" type="string" required={false}>
45+
Controls which Google TTS API mode to use. Possible values:
46+
- `tts`: Standard Google Cloud TTS voices (default).
47+
- `live`: HD voices using streaming mode for higher quality output.
48+
- `gemini`: Gemini TTS for AI-powered speech synthesis.
49+
50+
The mode is automatically selected based on configuration: if `options.model` is specified or a `model_id` is configured in the speech credentials, Gemini TTS is used. If an HD voice is selected, `live` mode is used. Otherwise, standard `tts` mode is used.
51+
</ParamField>
52+
<ParamField path="prompt" type="string" required={false}>
53+
A prompt sent to the TTS model to guide how the audio should be generated. Use this to specify the desired tone, emotion, speaking style, or context for the synthesized speech. This parameter is only applicable when using Gemini TTS (`apiMode: "gemini"` or when `model` is specified).
54+
</ParamField>
55+
</Accordion>
56+
4057
<Accordion title="cartesia">
4158
<ParamField path="voice_mode" type="string" required={false}>
4259
`embedding` or `id` (see [Cartesia docs](https://docs.cartesia.ai/api-reference/tts/bytes#request.body.voice))

0 commit comments

Comments
 (0)