Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions plugins/wasm-go/mcp-servers/mcp-speech-ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Speech AI MCP Server

An MCP server that provides pronunciation assessment, speech-to-text, and text-to-speech capabilities for AI agents. Built for language learning, accessibility, and voice applications.

## Features

- **Pronunciation Assessment**: Score English pronunciation at phoneme, word, and sentence level (0-100). 17MB model, <300ms latency. Exceeds human expert accuracy.
- **Speech-to-Text (STT)**: Transcribe audio with word-level timestamps and confidence scores.
- **Text-to-Speech (TTS)**: Generate natural speech with 12 English voices (US + UK accents). Ranked #1 on TTS Arena.

Source: [https://github.com/fasuizu-br/speech-ai-examples](https://github.com/fasuizu-br/speech-ai-examples)

Website: [https://brainiall.com](https://brainiall.com)

## Tools

| Tool | Description |
|------|-------------|
| `assess_pronunciation` | Score English pronunciation at phoneme, word, and sentence levels (0-100) |
| `transcribe_audio` | Transcribe audio to text with word-level timestamps |
| `synthesize_speech` | Generate speech from text with 12 English voices |
| `list_tts_voices` | List available TTS voices |

# Usage Guide

## Get API Key

1. Visit [Azure Marketplace](https://azuremarketplace.microsoft.com) and search for "Speech AI"
2. Subscribe to a plan (Free tier available)
3. Your API key will be provided after subscription

Or contact fasuizu@brainiall.com for a key.

## Generate SSE URL

On the MCP Server interface, log in and enter the API key to generate the URL.

## Configure MCP Client

Add the generated SSE URL to your MCP client configuration:

```json
"mcpServers": {
"speech-ai": {
"url": "https://mcp.higress.ai/mcp-speech-ai/{generate_key}"
}
}
```

## Example: Pronunciation Assessment

Send base64-encoded audio with the reference text to get detailed pronunciation scores:

- **Overall Score**: 0-100 calibrated score
- **Word Scores**: Individual word pronunciation quality
- **Phoneme Scores**: Granular phoneme-level feedback with IPA notation

## Supported Audio Formats

WAV, MP3, OGG, FLAC, WebM

## Pricing

$0.02 per API call. Free tier available via Azure Marketplace.
56 changes: 56 additions & 0 deletions plugins/wasm-go/mcp-servers/mcp-speech-ai/README_ZH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Speech AI MCP Server

MCP 服务器,提供发音评估、语音转文字和文字转语音功能,专为 AI 智能体设计。适用于语言学习、无障碍访问和语音应用场景。

## 功能特性

- **发音评估**:在音素、单词和句子级别对英语发音进行 0-100 分评分。17MB 模型,延迟 <300ms,准确度超过人类专家。
- **语音转文字(STT)**:将音频转录为文字,提供单词级时间戳和置信度分数。
- **文字转语音(TTS)**:使用 12 种英语语音(美式和英式口音)生成自然语音。在 TTS Arena 排名第一。

源码:[https://github.com/fasuizu-br/speech-ai-examples](https://github.com/fasuizu-br/speech-ai-examples)

官网:[https://brainiall.com](https://brainiall.com)

## 工具列表

| 工具 | 描述 |
|------|------|
| `assess_pronunciation` | 在音素、单词和句子级别评估英语发音(0-100分) |
| `transcribe_audio` | 将音频转录为文字,提供单词级时间戳 |
| `synthesize_speech` | 使用 12 种英语语音从文字生成语音 |
| `list_tts_voices` | 列出可用的 TTS 语音 |

# 使用指南

## 获取 API 密钥

1. 访问 [Azure Marketplace](https://azuremarketplace.microsoft.com) 搜索 "Speech AI"
2. 订阅计划(提供免费层级)
3. 订阅后将获得 API 密钥

或联系 fasuizu@brainiall.com 获取密钥。

## 生成 SSE URL

在 MCP Server 界面登录并输入 API 密钥生成 URL。

## 配置 MCP 客户端

将生成的 SSE URL 添加到 MCP 客户端配置中:

```json
"mcpServers": {
"speech-ai": {
"url": "https://mcp.higress.ai/mcp-speech-ai/{generate_key}"
}
}
```

## 支持的音频格式

WAV, MP3, OGG, FLAC, WebM

## 定价

每次 API 调用 $0.02。通过 Azure Marketplace 提供免费层级。
130 changes: 130 additions & 0 deletions plugins/wasm-go/mcp-servers/mcp-speech-ai/mcp-server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
server:
name: speech-ai-server
config:
apiKey: ""
tools:
- name: assess_pronunciation
description: "Evaluate English pronunciation quality by comparing spoken audio against reference text. Returns calibrated 0-100 scores at overall, word, and phoneme levels with IPA notation. 17MB model, <300ms latency."
args:
- name: audio
description: "Base64-encoded audio data (WAV, MP3, OGG, FLAC, or WebM)"
type: string
required: true
- name: text
description: "Reference text that was spoken in the audio"
type: string
required: true
- name: format
description: "Audio format"
type: string
required: false
default: "wav"
enum: ["wav", "mp3", "ogg", "flac", "webm"]
requestTemplate:
url: "https://api.brainiall.com/v1/pronunciation/assess/base64"
method: POST
headers:
- key: Content-Type
value: "application/json"
- key: Ocp-Apim-Subscription-Key
value: "{{.config.apiKey}}"
body: |
{
"audio": "{{.args.audio}}",
"text": "{{.args.text}}",
"format": "{{.args.format}}"
}
responseTemplate:
body: |
## Pronunciation Assessment Result
- **Overall Score**: {{.overallScore}}/100
- **Sentence Score**: {{.sentenceScore}}/100
- **Confidence**: {{.confidence}}
{{- range $index, $word := .words }}
### Word: {{$word.word}} (Score: {{$word.score}})
{{- range $pi, $ph := $word.phonemes }}
- {{$ph.phoneme}}: {{$ph.score}}
{{- end }}
{{- end }}

- name: transcribe_audio
description: "Transcribe audio to text with word-level timestamps and confidence scores. Supports WAV, MP3, OGG, FLAC, and WebM formats."
args:
- name: audio
description: "Base64-encoded audio data"
type: string
required: true
- name: format
description: "Audio format"
type: string
required: false
default: "wav"
enum: ["wav", "mp3", "ogg", "flac", "webm"]
requestTemplate:
url: "https://api.brainiall.com/v1/stt/transcribe/base64"
method: POST
headers:
- key: Content-Type
value: "application/json"
- key: Ocp-Apim-Subscription-Key
value: "{{.config.apiKey}}"
body: |
{
"audio": "{{.args.audio}}",
"format": "{{.args.format}}"
}
responseTemplate:
body: |
## Transcription Result
- **Text**: {{.text}}
{{- range $index, $word := .words }}
- {{$word.word}} ({{$word.start}}s - {{$word.end}}s, confidence: {{$word.confidence}})
{{- end }}

- name: synthesize_speech
description: "Generate natural speech from text with 12 English voices (US and UK accents). Returns base64-encoded audio. Ranked #1 on TTS Arena."
args:
- name: text
description: "Text to synthesize into speech"
type: string
required: true
- name: voice
description: "Voice ID to use for synthesis"
type: string
required: false
default: "af_heart"
requestTemplate:
url: "https://api.brainiall.com/v1/tts/synthesize"
method: POST
headers:
- key: Content-Type
value: "application/json"
- key: Ocp-Apim-Subscription-Key
value: "{{.config.apiKey}}"
body: |
{
"text": "{{.args.text}}",
"voice": "{{.args.voice}}"
}
responseTemplate:
body: |
## Speech Synthesis Result
- **Voice**: {{.voice}}
- **Duration**: {{.duration_ms}}ms
- **Audio**: {{.audio_base64}}

- name: list_tts_voices
description: "List all available text-to-speech voices with their names, genders, and accent information."
args: []
requestTemplate:
url: "https://api.brainiall.com/v1/tts/voices"
method: GET
headers:
- key: Ocp-Apim-Subscription-Key
value: "{{.config.apiKey}}"
responseTemplate:
body: |
## Available Voices
{{- range $index, $voice := .voices }}
- **{{$voice.id}}**: {{$voice.name}} ({{$voice.gender}}, {{$voice.accent}})
{{- end }}