Skip to content

Commit d06c6dd

Browse files
authored
Merge pull request #233 from pipecat-ai/mb/add-minimax-docs
Add MiniMax TTS docs
2 parents 844816f + f4f01bc commit d06c6dd

File tree

3 files changed

+238
-0
lines changed

3 files changed

+238
-0
lines changed

mint.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@
204204
"server/services/tts/google",
205205
"server/services/tts/groq",
206206
"server/services/tts/lmnt",
207+
"server/services/tts/minimax",
207208
"server/services/tts/neuphonic",
208209
"server/services/tts/riva",
209210
"server/services/tts/openai",

server/services/supported-services.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ description: "AI services integrated with Pipecat and their setup requirements"
6565
| [Google](/server/services/tts/google) | `pip install "pipecat-ai[google]"` |
6666
| [Groq](/server/services/tts/groq) | `pip install "pipecat-ai[groq]"` |
6767
| [LMNT](/server/services/tts/lmnt) | `pip install "pipecat-ai[lmnt]"` |
68+
| [MiniMax](/server/services/tts/minimax) | No dependencies required |
6869
| [Neuphonic](/server/services/tts/neuphonic) | `pip install "pipecat-ai[neuphonic]"` |
6970
| [NVIDIA Riva](/server/services/tts/riva) | `pip install "pipecat-ai[riva]"` |
7071
| [OpenAI](/server/services/tts/openai) | `pip install "pipecat-ai[openai]"` |

server/services/tts/minimax.mdx

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
---
2+
title: "MiniMax"
3+
description: "Text-to-speech service implementation using MiniMax T2A API"
4+
---
5+
6+
## Overview
7+
8+
`MiniMaxHttpTTSService` provides text-to-speech capabilities using MiniMax's T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options.
9+
10+
## Installation
11+
12+
To use `MiniMaxHttpTTSService`, no additional dependencies are required.
13+
14+
You'll also need MiniMax API credentials (API key and Group ID).
15+
16+
## Configuration
17+
18+
### Constructor Parameters
19+
20+
<ParamField path="api_key" type="str" required>
21+
MiniMax API key for authentication
22+
</ParamField>
23+
24+
<ParamField path="group_id" type="str" required>
25+
MiniMax Group ID to identify your project
26+
</ParamField>
27+
28+
<ParamField path="model" type="str" default="speech-02-turbo">
29+
30+
MiniMax TTS model to use. Available options include:
31+
32+
- `speech-02-hd`: HD model with superior rhythm and stability
33+
- `speech-02-turbo`: Turbo model with enhanced multilingual capabilities
34+
- `speech-01-hd`: Rich voices with expressive emotions
35+
- `speech-01-turbo`: Low-latency model with regular updates
36+
37+
</ParamField>
38+
39+
<ParamField path="voice_id" type="str" default="Calm_Woman">
40+
41+
MiniMax voice identifier. Options include:
42+
43+
- `Wise_Woman`
44+
- `Friendly_Person`
45+
- `Inspirational_girl`
46+
- `Deep_Voice_Man`
47+
- `Calm_Woman`
48+
- `Casual_Guy`
49+
- `Lively_Girl`
50+
- `Patient_Man`
51+
- `Young_Knight`
52+
- `Determined_Man`
53+
- `Lovely_Girl`
54+
- `Decent_Boy`
55+
- `Imposing_Manner`
56+
- `Elegant_Man`
57+
- `Abbess`
58+
- `Sweet_Girl_2`
59+
- `Exuberant_Girl`
60+
61+
See the [MiniMax documentation](https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643) for a complete list of available voices.
62+
63+
</ParamField>
64+
65+
<ParamField path="aiohttp_session" type="aiohttp.ClientSession" required>
66+
Aiohttp session for API communication
67+
</ParamField>
68+
69+
<ParamField path="sample_rate" type="int" default="None">
70+
Output audio sample rate in Hz
71+
</ParamField>
72+
73+
<ParamField path="params" type="InputParams" optional>
74+
TTS configuration parameters
75+
</ParamField>
76+
77+
### Input Parameters
78+
79+
<ParamField path="language" type="Language" default="Language.EN" optional>
80+
Language for TTS generation
81+
</ParamField>
82+
83+
<ParamField path="speed" type="float" default="1.0" optional>
84+
Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less
85+
than 1.0 decrease speed.
86+
</ParamField>
87+
88+
<ParamField path="volume" type="float" default="1.0" optional>
89+
Speech volume (range: 0 to 10). Values greater than 1.0 increase volume.
90+
</ParamField>
91+
92+
<ParamField path="pitch" type="float" default="0" optional>
93+
Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative
94+
values lower pitch.
95+
</ParamField>
96+
97+
<ParamField path="emotion" type="str" optional>
98+
Emotional tone of the speech. Options include: "happy", "sad", "angry",
99+
"fearful", "disgusted", "surprised", and "neutral".
100+
</ParamField>
101+
102+
<ParamField path="english_normalization" type="bool" optional>
103+
Whether to apply English text normalization, which improves performance in
104+
number-reading scenarios at the cost of slightly increased latency.
105+
</ParamField>
106+
107+
## Output Frames
108+
109+
### Control Frames
110+
111+
<ParamField path="TTSStartedFrame" type="Frame">
112+
Signals start of speech synthesis
113+
</ParamField>
114+
115+
<ParamField path="TTSStoppedFrame" type="Frame">
116+
Signals completion of speech synthesis
117+
</ParamField>
118+
119+
### Audio Frames
120+
121+
<ParamField path="TTSAudioRawFrame" type="Frame">
122+
123+
Contains generated audio data with:
124+
125+
- PCM audio format
126+
- Sample rate as specified
127+
- Single channel (mono)
128+
129+
</ParamField>
130+
131+
### Error Frames
132+
133+
<ParamField path="ErrorFrame" type="Frame">
134+
Contains MiniMax API error information
135+
</ParamField>
136+
137+
## Methods
138+
139+
See the [TTS base class methods](/server/base-classes/speech#ttsservice) for additional functionality.
140+
141+
## Language Support
142+
143+
Supports a wide range of languages through the `language_boost` parameter:
144+
145+
| Language Code | Service Code | Description |
146+
| -------------- | ------------- | ------------------- |
147+
| `Language.AR` | `Arabic` | Arabic |
148+
| `Language.CS` | `Czech` | Czech |
149+
| `Language.DE` | `German` | German |
150+
| `Language.EL` | `Greek` | Greek |
151+
| `Language.EN` | `English` | English |
152+
| `Language.ES` | `Spanish` | Spanish |
153+
| `Language.FI` | `Finnish` | Finnish |
154+
| `Language.FR` | `French` | French |
155+
| `Language.HI` | `Hindi` | Hindi |
156+
| `Language.ID` | `Indonesian` | Indonesian |
157+
| `Language.IT` | `Italian` | Italian |
158+
| `Language.JA` | `Japanese` | Japanese |
159+
| `Language.KO` | `Korean` | Korean |
160+
| `Language.NL` | `Dutch` | Dutch |
161+
| `Language.PL` | `Polish` | Polish |
162+
| `Language.PT` | `Portuguese` | Portuguese |
163+
| `Language.RO` | `Romanian` | Romanian |
164+
| `Language.RU` | `Russian` | Russian |
165+
| `Language.TH` | `Thai` | Thai |
166+
| `Language.TR` | `Turkish` | Turkish |
167+
| `Language.UK` | `Ukrainian` | Ukrainian |
168+
| `Language.VI` | `Vietnamese` | Vietnamese |
169+
| `Language.YUE` | `Chinese,Yue` | Chinese (Cantonese) |
170+
| `Language.ZH` | `Chinese` | Chinese (Mandarin) |
171+
172+
## Usage Example
173+
174+
```python
175+
import aiohttp
176+
import os
177+
from pipecat.services.minimax.tts import MiniMaxHttpTTSService
178+
from pipecat.transcriptions.language import Language
179+
180+
async def create_tts_service():
181+
# Create an HTTP session
182+
session = aiohttp.ClientSession()
183+
184+
# Configure service with credentials
185+
tts = MiniMaxHttpTTSService(
186+
api_key=os.getenv("MINIMAX_API_KEY"),
187+
group_id=os.getenv("MINIMAX_GROUP_ID"),
188+
model="speech-02-turbo",
189+
voice_id="Patient_Man",
190+
aiohttp_session=session,
191+
params=MiniMaxHttpTTSService.InputParams(
192+
language=Language.EN,
193+
speed=1.1, # Slightly faster speech
194+
volume=1.2, # Slightly louder
195+
pitch=0, # Default pitch
196+
emotion="neutral" # Neutral emotional tone
197+
)
198+
)
199+
200+
return tts
201+
202+
# Use in pipeline
203+
pipeline = Pipeline([
204+
...,
205+
llm,
206+
tts,
207+
transport.output(),
208+
])
209+
```
210+
211+
## Frame Flow
212+
213+
```mermaid
214+
graph TD
215+
A[TextFrame] --> B[MiniMaxHttpTTSService]
216+
B --> C[TTSStartedFrame]
217+
B --> D[TTSAudioRawFrame]
218+
B --> E[TTSStoppedFrame]
219+
B --> F[ErrorFrame]
220+
```
221+
222+
## Metrics Support
223+
224+
The service collects processing metrics:
225+
226+
- Time to First Byte (TTFB)
227+
- Processing duration
228+
- Character usage
229+
230+
## Notes
231+
232+
- Uses streaming audio generation for faster initial response
233+
- Processes audio in chunks for efficient memory usage
234+
- Supports real-time applications with low latency
235+
- Automatically handles API authentication
236+
- Provides PCM audio compatible with most audio pipelines

0 commit comments

Comments
 (0)