Skip to content

Commit 99357e8

Browse files
debpalashclaude
andauthored
feat(mcp): MCP server v1 — mount on /mcp, per-agent voice binding, stdio shim (Wave 2.2) (#368)
* feat(mcp): MCP server v1 — mount on /mcp, per-agent voice binding, stdio shim (Wave 2.2) The FastMCP server (previously dead code, never mounted) is now mounted on the main FastAPI app at /mcp via Streamable HTTP, with its session manager composed into the app lifespan through an AsyncExitStack (best-effort: a missing mcp package or OMNIVOICE_MCP_DISABLE=1 never breaks startup). streamable_http_path set to '/' so the sub-mount lands at /mcp, not /mcp/mcp. Adds the 'mcp' dependency (1.27.x). Per-agent voice binding (Spec 2 headline): each MCP client sends an X-OmniVoice-Client-Id header; generate_speech resolves the voice as explicit arg > the client's binding > global default > app default. New mcp_client_bindings table (alembic 0004 + _BASE_SCHEMA, additive/idempotent), services/mcp_bindings.py (CRUD + resolve_voice + best-effort last_seen), and a loopback-gated REST router (/api/mcp/bindings) the Settings panel drives. New transcribe tool (base64 audio in, 200 MB cap). Stdio shim (backend/mcp_shim, httpx-only, ported from voicebox MIT) proxies stdio clients to the mounted endpoint and forwards OMNIVOICE_CLIENT_ID as the binding header. Settings → Sharing gains an MCP bindings panel. Docs: docs/mcp.md (both connection modes + binding REST) and docs/mcp.json updated to the shim form. Tests: bindings service + resolution precedence + migration up/down (pure, run locally); REST CRUD + mount-not-404 + disable-flag (main-importing, validated in CI). MCP build + mount + initialize handshake verified out-of-band (no torch). Spec: docs/competitive-analysis.md Spec 2 / parity program Wave 2.2. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(mcp): assert /mcp mount via app.routes, not a lifespan client The two main-importing mount tests ran the app lifespan, which now starts the FastMCP session manager and binds asyncio queues to the test loop — contaminating later lifespan-running tests ('bound to a different event loop'). The mount happens at import time, so inspecting app.routes for the /mcp Mount is the correct loop-free assertion. Same fix shape as the Wave 0.2 consent tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(mcp): stop reload-main poisoning across the MCP test files Root cause of the CI failure: the bindings REST fixture set OMNIVOICE_MCP_DISABLE=1 and reloaded main but never restored it, so a later 'from main import app' in test_mcp_mount saw /mcp un-mounted ({'/audio','/voice_audio'}). Reloading main mutates the shared module for every subsequent test. - REST fixture: drop the disable flag (the mount is harmless without a lifespan), yield the client, and restore main (+ core.config/db) to the default data dir in teardown so the global module is clean again. - test_main_mounts_mcp_route: reload main with the disable flag cleared so the assertion is independent of any earlier reload. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent c8fdcb6 commit 99357e8

16 files changed

Lines changed: 1094 additions & 4 deletions

File tree

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
"""REST CRUD for per-agent MCP voice bindings (Wave 2.2 / Spec 2).
2+
3+
Loopback-gated — the Settings UI manages bindings here. The MCP tools
4+
themselves resolve voices via ``services.mcp_bindings.resolve_voice``.
5+
"""
6+
7+
from __future__ import annotations
8+
9+
from fastapi import APIRouter, Depends, HTTPException
10+
from pydantic import BaseModel, Field
11+
12+
from api.dependencies import require_loopback
13+
from services import mcp_bindings
14+
15+
router = APIRouter(
16+
prefix="/api/mcp",
17+
tags=["mcp"],
18+
dependencies=[Depends(require_loopback)],
19+
)
20+
21+
22+
class _BindingBody(BaseModel):
23+
client_id: str = Field(..., min_length=1, max_length=128)
24+
label: str | None = None
25+
profile_id: str | None = None
26+
default_engine: str | None = None
27+
28+
29+
@router.get("/bindings")
30+
def list_bindings():
31+
"""All per-agent voice bindings, most-recently-seen first."""
32+
return mcp_bindings.list_bindings()
33+
34+
35+
@router.put("/bindings")
36+
def upsert_binding(body: _BindingBody):
37+
"""Create or update the binding for an MCP client id."""
38+
try:
39+
return mcp_bindings.upsert_binding(
40+
body.client_id,
41+
label=body.label,
42+
profile_id=body.profile_id,
43+
default_engine=body.default_engine,
44+
)
45+
except ValueError as e:
46+
raise HTTPException(status_code=400, detail=str(e))
47+
48+
49+
@router.delete("/bindings/{client_id}")
50+
def delete_binding(client_id: str):
51+
if not mcp_bindings.delete_binding(client_id):
52+
raise HTTPException(status_code=404, detail="No binding for that client id")
53+
return {"deleted": client_id}

backend/core/db.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,19 @@ def db_conn():
142142
value TEXT NOT NULL,
143143
updated_at REAL NOT NULL
144144
);
145+
146+
-- Wave 2.2: per-agent MCP voice bindings. An MCP client (Claude Code,
147+
-- Cursor, …) identified by the X-OmniVoice-Client-Id header it sends is
148+
-- bound to a default voice profile / engine. Fresh installs create it
149+
-- here; v0.3.x upgrades get it via alembic 0004.
150+
CREATE TABLE IF NOT EXISTS mcp_client_bindings (
151+
client_id TEXT PRIMARY KEY,
152+
label TEXT NOT NULL DEFAULT '',
153+
profile_id TEXT,
154+
default_engine TEXT,
155+
last_seen_at REAL,
156+
created_at REAL
157+
);
145158
"""
146159

147160
# Only tables/columns this module is allowed to ALTER. Prevents SQL injection via

backend/main.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,23 @@ def _warm():
428428
capture_preload_task = asyncio.create_task(_preload_capture_asr())
429429
else:
430430
logger.info("Capture ASR preload disabled; dictation ASR will load on first use.")
431-
yield
431+
432+
# ── MCP session manager (Wave 2.2) ────────────────────────────────────
433+
# FastMCP's Streamable-HTTP transport needs its session manager running
434+
# for the lifetime of the app. It's created lazily by streamable_http_app()
435+
# (called in mount_mcp below), so we stack its `run()` context into ours
436+
# via AsyncExitStack rather than replacing this lifespan. Best-effort: a
437+
# missing/broken MCP layer must never stop the rest of the backend.
438+
from contextlib import AsyncExitStack
439+
async with AsyncExitStack() as _mcp_stack:
440+
_sm = getattr(app.state, "mcp_session_manager", None)
441+
if _sm is not None:
442+
try:
443+
await _mcp_stack.enter_async_context(_sm.run())
444+
logger.info("MCP server mounted at /mcp")
445+
except Exception as e:
446+
logger.warning("MCP session manager failed to start: %s", e)
447+
yield
432448
# ── Graceful shutdown (SIGTERM from Tauri, Ctrl+C, etc.) ────────────
433449
logger.info("Shutdown: cleaning up…")
434450
idle_task.cancel()
@@ -749,6 +765,27 @@ def health():
749765
app.include_router(marketplace.router)
750766
app.include_router(sonitranslate.router)
751767
app.include_router(settings_router.router) # Phase 1 AUTH-03 endpoints
768+
from api.routers import mcp_bindings as _mcp_bindings_router # noqa: E402
769+
app.include_router(_mcp_bindings_router.router) # Wave 2.2 per-agent voice bindings
770+
771+
# ── Mount the MCP server (Wave 2.2) ───────────────────────────────────────
772+
# FastMCP's Streamable-HTTP app is sub-mounted at /mcp; its session manager is
773+
# stashed on app.state for the lifespan above to run. Opt-out via
774+
# OMNIVOICE_MCP_DISABLE=1; best-effort so a missing mcp package or a build
775+
# without it never breaks startup.
776+
if os.environ.get("OMNIVOICE_MCP_DISABLE", "").strip().lower() not in ("1", "true", "yes", "on"):
777+
try:
778+
from mcp_server import create_mcp_server
779+
780+
_mcp = create_mcp_server()
781+
_mcp_app = _mcp.streamable_http_app()
782+
app.state.mcp_session_manager = _mcp.session_manager
783+
app.mount("/mcp", _mcp_app)
784+
logging.getLogger("omnivoice.api").info("MCP app mounted at /mcp")
785+
except Exception as _mcp_err: # noqa: BLE001
786+
logging.getLogger("omnivoice.api").info(
787+
"MCP server not mounted (%s); /mcp disabled.", _mcp_err
788+
)
752789

753790
frontend_path = os.path.join(os.path.dirname(__file__), "..", "frontend", "dist")
754791
if os.path.exists(frontend_path):

backend/mcp_server.py

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,14 @@ def create_mcp_server():
5353
"voice design, and video dubbing in 646 languages."
5454
),
5555
)
56+
# Serve the Streamable-HTTP transport at the app root so mounting the whole
57+
# app at "/mcp" on the main FastAPI yields the endpoint at "/mcp". FastMCP's
58+
# default path is "/mcp", which would double-prefix to "/mcp/mcp" when
59+
# sub-mounted. Harmless for the standalone CLI run() path.
60+
try:
61+
mcp.settings.streamable_http_path = "/"
62+
except Exception:
63+
pass
5664

5765
# ── Helpers ─────────────────────────────────────────────────────────
5866

@@ -75,6 +83,21 @@ async def _api_post_form(path: str, data: dict, files: dict | None = None):
7583

7684
# ── Tools ───────────────────────────────────────────────────────────
7785

86+
def _current_client_id() -> str | None:
87+
"""The X-OmniVoice-Client-Id of the calling MCP client, if any.
88+
89+
FastMCP exposes the HTTP request via its request context on the
90+
Streamable-HTTP transport; stdio clients (and any version where the
91+
accessor differs) simply resolve to None and fall back to the
92+
global default voice."""
93+
try:
94+
req = mcp.get_context().request_context.request
95+
if req is not None:
96+
return req.headers.get("x-omnivoice-client-id")
97+
except Exception:
98+
pass
99+
return None
100+
78101
@mcp.tool()
79102
async def generate_speech(
80103
text: str,
@@ -89,7 +112,8 @@ async def generate_speech(
89112
Args:
90113
text: The text to synthesize into speech.
91114
language: Target language (ISO code or 'Auto'). 646 languages supported.
92-
profile_id: ID of a saved voice profile to clone. Omit for voice design mode.
115+
profile_id: ID of a saved voice profile to clone. Omit to use this
116+
agent's bound voice (Settings → MCP), else the global default.
93117
instruct: Style instruction (e.g. 'whisper', 'excited', 'narrator').
94118
speed: Speech speed multiplier (0.5–2.0, default 1.0).
95119
steps: Diffusion steps (8=fast/draft, 16=balanced, 32=quality).
@@ -98,6 +122,17 @@ async def generate_speech(
98122
JSON with audio_id, generation_time, audio_duration, and
99123
base64-encoded WAV data.
100124
"""
125+
# Per-agent voice binding (Wave 2.2): explicit arg wins; otherwise
126+
# resolve this client's bound profile, then the global default.
127+
client_id = _current_client_id()
128+
try:
129+
from services import mcp_bindings
130+
resolved = mcp_bindings.resolve_voice(client_id, profile_id)
131+
profile_id = resolved.get("profile_id")
132+
mcp_bindings.touch_last_seen(client_id) if client_id else None
133+
except Exception:
134+
pass # binding layer unavailable — use whatever was passed
135+
101136
form = {
102137
"text": text,
103138
"language": language,
@@ -159,6 +194,34 @@ async def list_languages() -> str:
159194
'],"note":"Pass any ISO 639 code or set language=Auto for detection."}'
160195
)
161196

197+
@mcp.tool()
198+
async def transcribe(audio_base64: str, language: str | None = None) -> str:
199+
"""Transcribe spoken audio to text.
200+
201+
Args:
202+
audio_base64: Base64-encoded audio bytes (wav/mp3/webm/m4a).
203+
language: Optional language hint; omit for auto-detect.
204+
205+
Returns:
206+
JSON with the recognized text, language, and duration.
207+
"""
208+
try:
209+
raw = base64.b64decode(audio_base64, validate=True)
210+
except Exception:
211+
return '{"error":"audio_base64 is not valid base64"}'
212+
# 200 MB cap — same spirit as voicebox's transcribe gate. Keeps a
213+
# buggy/hostile agent from posting an unbounded blob.
214+
if len(raw) > 200 * 1024 * 1024:
215+
return '{"error":"audio exceeds 200 MB limit"}'
216+
data = {}
217+
if language:
218+
data["language"] = language
219+
r = await _api_post_form(
220+
"/transcribe", data=data,
221+
files={"audio": ("audio.wav", raw, "application/octet-stream")},
222+
)
223+
return str(r.json())
224+
162225
@mcp.tool()
163226
async def check_health() -> str:
164227
"""Check if the OmniVoice backend is running and what GPU device is active."""

backend/mcp_shim/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""omnivoice-mcp — stdio MCP shim for clients that only speak stdio."""

0 commit comments

Comments
 (0)