Skip to content

Commit db52e09

Browse files
fasuizu-brClaude DevOps Engineer
authored andcommitted
feat: add Speech AI MCP server for pronunciation, TTS, and STT
1 parent e2a22d1 commit db52e09

3 files changed

Lines changed: 250 additions & 0 deletions

File tree

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Speech AI MCP Server
2+
3+
An MCP server that provides pronunciation assessment, speech-to-text, and text-to-speech capabilities for AI agents. Built for language learning, accessibility, and voice applications.
4+
5+
## Features
6+
7+
- **Pronunciation Assessment**: Score English pronunciation at phoneme, word, and sentence level (0-100). 17MB model, <300ms latency. Exceeds human expert accuracy.
8+
- **Speech-to-Text (STT)**: Transcribe audio with word-level timestamps and confidence scores.
9+
- **Text-to-Speech (TTS)**: Generate natural speech with 12 English voices (US + UK accents). Ranked #1 on TTS Arena.
10+
11+
Source: [https://github.com/fasuizu-br/speech-ai-examples](https://github.com/fasuizu-br/speech-ai-examples)
12+
13+
Website: [https://brainiall.com](https://brainiall.com)
14+
15+
## Tools
16+
17+
| Tool | Description |
18+
|------|-------------|
19+
| `assess_pronunciation` | Score English pronunciation at phoneme, word, and sentence levels (0-100) |
20+
| `transcribe_audio` | Transcribe audio to text with word-level timestamps |
21+
| `synthesize_speech` | Generate speech from text with 12 English voices |
22+
| `list_tts_voices` | List available TTS voices |
23+
24+
# Usage Guide
25+
26+
## Get API Key
27+
28+
1. Visit [Azure Marketplace](https://azuremarketplace.microsoft.com) and search for "Speech AI"
29+
2. Subscribe to a plan (Free tier available)
30+
3. Your API key will be provided after subscription
31+
32+
Or contact fasuizu@brainiall.com for a key.
33+
34+
## Generate SSE URL
35+
36+
On the MCP Server interface, log in and enter the API key to generate the URL.
37+
38+
## Configure MCP Client
39+
40+
Add the generated SSE URL to your MCP client configuration:
41+
42+
```json
43+
"mcpServers": {
44+
"speech-ai": {
45+
"url": "https://mcp.higress.ai/mcp-speech-ai/{generate_key}"
46+
}
47+
}
48+
```
49+
50+
## Example: Pronunciation Assessment
51+
52+
Send base64-encoded audio with the reference text to get detailed pronunciation scores:
53+
54+
- **Overall Score**: 0-100 calibrated score
55+
- **Word Scores**: Individual word pronunciation quality
56+
- **Phoneme Scores**: Granular phoneme-level feedback with IPA notation
57+
58+
## Supported Audio Formats
59+
60+
WAV, MP3, OGG, FLAC, WebM
61+
62+
## Pricing
63+
64+
$0.02 per API call. Free tier available via Azure Marketplace.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Speech AI MCP Server
2+
3+
MCP 服务器,提供发音评估、语音转文字和文字转语音功能,专为 AI 智能体设计。适用于语言学习、无障碍访问和语音应用场景。
4+
5+
## 功能特性
6+
7+
- **发音评估**:在音素、单词和句子级别对英语发音进行 0-100 分评分。17MB 模型,延迟 <300ms,准确度超过人类专家。
8+
- **语音转文字(STT)**:将音频转录为文字,提供单词级时间戳和置信度分数。
9+
- **文字转语音(TTS)**:使用 12 种英语语音(美式和英式口音)生成自然语音。在 TTS Arena 排名第一。
10+
11+
源码:[https://github.com/fasuizu-br/speech-ai-examples](https://github.com/fasuizu-br/speech-ai-examples)
12+
13+
官网:[https://brainiall.com](https://brainiall.com)
14+
15+
## 工具列表
16+
17+
| 工具 | 描述 |
18+
|------|------|
19+
| `assess_pronunciation` | 在音素、单词和句子级别评估英语发音(0-100分) |
20+
| `transcribe_audio` | 将音频转录为文字,提供单词级时间戳 |
21+
| `synthesize_speech` | 使用 12 种英语语音从文字生成语音 |
22+
| `list_tts_voices` | 列出可用的 TTS 语音 |
23+
24+
# 使用指南
25+
26+
## 获取 API 密钥
27+
28+
1. 访问 [Azure Marketplace](https://azuremarketplace.microsoft.com) 搜索 "Speech AI"
29+
2. 订阅计划(提供免费层级)
30+
3. 订阅后将获得 API 密钥
31+
32+
或联系 fasuizu@brainiall.com 获取密钥。
33+
34+
## 生成 SSE URL
35+
36+
在 MCP Server 界面登录并输入 API 密钥生成 URL。
37+
38+
## 配置 MCP 客户端
39+
40+
将生成的 SSE URL 添加到 MCP 客户端配置中:
41+
42+
```json
43+
"mcpServers": {
44+
"speech-ai": {
45+
"url": "https://mcp.higress.ai/mcp-speech-ai/{generate_key}"
46+
}
47+
}
48+
```
49+
50+
## 支持的音频格式
51+
52+
WAV, MP3, OGG, FLAC, WebM
53+
54+
## 定价
55+
56+
每次 API 调用 $0.02。通过 Azure Marketplace 提供免费层级。
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
server:
2+
name: speech-ai-server
3+
config:
4+
apiKey: ""
5+
tools:
6+
- name: assess_pronunciation
7+
description: "Evaluate English pronunciation quality by comparing spoken audio against reference text. Returns calibrated 0-100 scores at overall, word, and phoneme levels with IPA notation. 17MB model, <300ms latency."
8+
args:
9+
- name: audio
10+
description: "Base64-encoded audio data (WAV, MP3, OGG, FLAC, or WebM)"
11+
type: string
12+
required: true
13+
- name: text
14+
description: "Reference text that was spoken in the audio"
15+
type: string
16+
required: true
17+
- name: format
18+
description: "Audio format"
19+
type: string
20+
required: false
21+
default: "wav"
22+
enum: ["wav", "mp3", "ogg", "flac", "webm"]
23+
requestTemplate:
24+
url: "https://api.brainiall.com/v1/pronunciation/assess/base64"
25+
method: POST
26+
headers:
27+
- key: Content-Type
28+
value: "application/json"
29+
- key: Ocp-Apim-Subscription-Key
30+
value: "{{.config.apiKey}}"
31+
body: |
32+
{
33+
"audio": "{{.args.audio}}",
34+
"text": "{{.args.text}}",
35+
"format": "{{.args.format}}"
36+
}
37+
responseTemplate:
38+
body: |
39+
## Pronunciation Assessment Result
40+
- **Overall Score**: {{.overallScore}}/100
41+
- **Sentence Score**: {{.sentenceScore}}/100
42+
- **Confidence**: {{.confidence}}
43+
{{- range $index, $word := .words }}
44+
### Word: {{$word.word}} (Score: {{$word.score}})
45+
{{- range $pi, $ph := $word.phonemes }}
46+
- {{$ph.phoneme}}: {{$ph.score}}
47+
{{- end }}
48+
{{- end }}
49+
50+
- name: transcribe_audio
51+
description: "Transcribe audio to text with word-level timestamps and confidence scores. Supports WAV, MP3, OGG, FLAC, and WebM formats."
52+
args:
53+
- name: audio
54+
description: "Base64-encoded audio data"
55+
type: string
56+
required: true
57+
- name: format
58+
description: "Audio format"
59+
type: string
60+
required: false
61+
default: "wav"
62+
enum: ["wav", "mp3", "ogg", "flac", "webm"]
63+
requestTemplate:
64+
url: "https://api.brainiall.com/v1/stt/transcribe/base64"
65+
method: POST
66+
headers:
67+
- key: Content-Type
68+
value: "application/json"
69+
- key: Ocp-Apim-Subscription-Key
70+
value: "{{.config.apiKey}}"
71+
body: |
72+
{
73+
"audio": "{{.args.audio}}",
74+
"format": "{{.args.format}}"
75+
}
76+
responseTemplate:
77+
body: |
78+
## Transcription Result
79+
- **Text**: {{.text}}
80+
{{- range $index, $word := .words }}
81+
- {{$word.word}} ({{$word.start}}s - {{$word.end}}s, confidence: {{$word.confidence}})
82+
{{- end }}
83+
84+
- name: synthesize_speech
85+
description: "Generate natural speech from text with 12 English voices (US and UK accents). Returns base64-encoded audio. Ranked #1 on TTS Arena."
86+
args:
87+
- name: text
88+
description: "Text to synthesize into speech"
89+
type: string
90+
required: true
91+
- name: voice
92+
description: "Voice ID to use for synthesis"
93+
type: string
94+
required: false
95+
default: "af_heart"
96+
requestTemplate:
97+
url: "https://api.brainiall.com/v1/tts/synthesize"
98+
method: POST
99+
headers:
100+
- key: Content-Type
101+
value: "application/json"
102+
- key: Ocp-Apim-Subscription-Key
103+
value: "{{.config.apiKey}}"
104+
body: |
105+
{
106+
"text": "{{.args.text}}",
107+
"voice": "{{.args.voice}}"
108+
}
109+
responseTemplate:
110+
body: |
111+
## Speech Synthesis Result
112+
- **Voice**: {{.voice}}
113+
- **Duration**: {{.duration_ms}}ms
114+
- **Audio**: {{.audio_base64}}
115+
116+
- name: list_tts_voices
117+
description: "List all available text-to-speech voices with their names, genders, and accent information."
118+
args: []
119+
requestTemplate:
120+
url: "https://api.brainiall.com/v1/tts/voices"
121+
method: GET
122+
headers:
123+
- key: Ocp-Apim-Subscription-Key
124+
value: "{{.config.apiKey}}"
125+
responseTemplate:
126+
body: |
127+
## Available Voices
128+
{{- range $index, $voice := .voices }}
129+
- **{{$voice.id}}**: {{$voice.name}} ({{$voice.gender}}, {{$voice.accent}})
130+
{{- end }}

0 commit comments

Comments
 (0)