Skip to content

speak api does not produce voice (vexa hosted) #337

@sagebomb

Description

@sagebomb

What happened?

POST /bots/{platform}/{native_meeting_id}/speak returns HTTP 202 with body {"message":"Speak command sent","meeting_id":<id>} but the bot never plays any audio in the meeting and the WebSocket at
wss://api.cloud.vexa.ai/ws (subscribed to the same meeting) emits neither speak.started nor speak.completed within 20+ seconds. There is also no error event. chat and transcription work fine on
the same bot — only /speak is silently no-op'd.

Reproduced on three independent meetings / call IDs, across two API keys, all under the same account.

What did you expect?

After POST /speak returns 202:

  • The WebSocket should emit speak.started (and eventually speak.completed or speak.interrupted).
  • The bot should briefly unmute, play the TTS audio in the meeting, and re-mute — per the documented behavior: "The bot unmutes, plays the audio, then re-mutes."

Alternatively, if /speak cannot be fulfilled (account tier, missing provider key, etc.), the response should be a 4xx with a clear error — not a misleading 202.

How to reproduce?

  1. export VEXA_API_KEY=vxa_bot_…
  2. Create a bot in a fresh Google Meet:
    curl -X POST -H "X-API-Key: $VEXA_API_KEY" -H "Content-Type: application/json" \
      -d '{"platform":"google_meet","native_meeting_id":"<id>","bot_name":"Juno",
           "language":"en","transcribe_enabled":true,"voice_agent_enabled":true}' \
      https://api.cloud.vexa.ai/bots
    Returns 201 with a call_id. Admit the bot in the Meet host UI. Status transitions to active.
  3. Run this diagnostic (subscribes to /ws, sends one /speak, watches for events):
    python3 scripts/diag_tts.py google_meet <native_meeting_id>
    The script connects to wss://api.cloud.vexa.ai/ws, sends {"action":"subscribe","meetings":[{"platform":"google_meet","native_id":""}]}, gets back {"type":"subscribed",...}, then POSTs /speak with
    {"text":"...","provider":"openai","voice":"alloy"}. Watches WS for 20 s.
  4. Observed every time: POST /speak → 202 OK, but speak.started is never emitted; no audio in the meeting.

Logs / screenshots?

Reproduction 1 (call_id 12674, native jcr-pnrn-tbw, 2026-05-18):

  • chat worked (chat appeared in Meet UI)
  • 3 /speak calls with voice: nova, voice: alloy, and provider: elevenlabs — all 202, all silent

Reproduction 2 (call_id 12677, native stj-keti-zoz, 2026-05-18) — diag_tts.py output:
WS [ 0.20s] subscribed: subscribed

POST /speak HTTP 202: {"message":"Speak command sent","meeting_id":12677}
speak events: []
result: TTS STILL SILENT

Reproduction 3 (call_id 12752, native gcm-isji-yds, 2026-05-19) — diag_tts.py output:
WS [ 0.17s] subscribed: subscribed

POST /speak HTTP 202: {"message":"Speak command sent","meeting_id":12752}
RESULT: TTS SILENT. /speak returned 202 but no speak.* events fired.

meetings/{call_id} for the affected bots echoes data.transcribe_enabled: true but does not echo any voice_agent_enabled / agent_enabled / TTS-related field — possible hint that the flag is being silently dropped
on input.

Version / env?

  • Vexa: hosted (api.cloud.vexa.ai), accessed via REST + wss://api.cloud.vexa.ai/ws
  • Affected call_ids: 12674, 12677, 12752 (all google_meet, all admitted, all status: active at the time of /speak)
  • Client: Python 3.11, aiohttp 3.x, certifi for TLS; bare-metal macOS (Darwin 24.4)
  • Reproduction script: atached

diag_tts.py

  • Tried voices: alloy, nova (OpenAI). Tried providers: openai, elevenlabs. All 202, all silent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions