You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An updated version of the stable-video-diffusion-img2vid-xt model with enhanced performance ([limited-commercial use license](https://stability.ai/license)).
Copy file name to clipboardexpand all lines: ai/pipelines/text-to-speech.mdx
+32-13
Original file line number
Diff line number
Diff line change
@@ -4,17 +4,22 @@ title: Text-to-Speech
4
4
5
5
## Overview
6
6
7
-
The text-to-speech endpoint in Livepeer utilizes [Parler-TTS](https://github.com/huggingface/parler-tts), specifically `parler-tts/parler-tts-large-v1`. This model can generate speech with customizable characteristics such as voice type, speaking style, and audio quality.
`parler-tts/parler-tts-large-v1`. This model can generate speech with
10
+
customizable characteristics such as voice type, speaking style, and audio
11
+
quality.
8
12
9
13
## Basic Usage Instructions
10
14
11
15
<Tip>
12
-
For a detailed understanding of the `text-to-speech` endpoint and to experiment
13
-
with the API, see the [Livepeer AI API
16
+
For a detailed understanding of the `text-to-speech` endpoint and to
17
+
experiment with the API, see the [Livepeer AI API
14
18
Reference](/ai/api-reference/text-to-speech).
15
19
</Tip>
16
20
17
-
To use the text-to-speech feature, submit a POST request to the `/text-to-speech` endpoint. Here's an example of how to structure your request:
21
+
To use the text-to-speech feature, submit a POST request to the
22
+
`/text-to-speech` endpoint. Here's an example of how to structure your request:
18
23
19
24
```bash
20
25
curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
@@ -28,29 +33,43 @@ curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
28
33
29
34
### Request Parameters
30
35
31
-
-`model_id`: The ID of the text-to-speech model to use. Currently, this should be set to `"parler-tts/parler-tts-large-v1"`.
36
+
-`model_id`: The ID of the text-to-speech model to use. Currently, this should
37
+
be set to `"parler-tts/parler-tts-large-v1"`.
32
38
-`text`: The text you want to convert to speech.
33
-
-`description`: A description of the desired voice characteristics. This can include details about the speaker's voice, speaking style, and audio quality.
39
+
-`description`: A description of the desired voice characteristics. This can
40
+
include details about the speaker's voice, speaking style, and audio quality.
34
41
35
42
### Voice Customization
36
43
37
-
You can customize the generated voice by adjusting the `description` parameter. Some aspects you can control include:
44
+
You can customize the generated voice by adjusting the `description` parameter.
45
+
Some aspects you can control include:
38
46
39
47
- Speaker identity (e.g., "Jon's voice")
40
48
- Speaking style (e.g., "monotone", "expressive")
41
49
- Speaking speed (e.g., "slightly fast")
42
50
- Audio quality (e.g., "very close recording", "no background noise")
43
51
44
-
The checkpoint was trained on 34 speakers. The full list of available speakers includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan, Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa, Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce, and Emily.
52
+
The checkpoint was trained on 34 speakers. The full list of available speakers
However, the models performed better with certain speakers. A list of the top 20 speakers for each model variant, ranked by their average speaker similarity scores can be found [here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
58
+
However, the models performed better with certain speakers. A list of the top 20
59
+
speakers for each model variant, ranked by their average speaker similarity
- The maximum length of the input text may be limited. For long-form content, you will need to split your text into smaller chunks. The training default configuration in parler-tts is max 30sec, max text length 600 characters.
0 commit comments