Speaches compatibility fixes for custom TTS voices and STT transport #223

drascom · 2026-04-04T16:17:13Z

drascom
Apr 4, 2026

We hit two integration issues while using Dograh with a local Speaches server.

Setup:

STT provider: speaches
STT model: Systran/faster-whisper-small
TTS provider: speaches
TTS model: speaches-ai/piper-tr_TR-fettah-medium
TTS voice: fettah
Speaches base URL: http://localhost:9100/v1
Repro:

STT works directly against Speaches with:

curl -X POST http://localhost:9100/v1/audio/transcriptions \
  -F "model=Systran/faster-whisper-small" \
  -F "file=@audio.wav"

TTS works directly against Speaches with:

curl -X POST http://localhost:9100/v1/audio/speech \
  -H "Content-Type: application/json" \
  --output turkish-fettah.wav \
  -d '{
    "model": "speaches-ai/piper-tr_TR-fettah-medium",
    "voice": "fettah",
    "input": "Merhaba, ben Turkce konusuyorum.",
    "response_format": "wav"
  }'

Issue seen in Dograh:

Pipeline error: Error processing frame: 'fettah'
Root causes:

The Speaches TTS adapter was inheriting OpenAI voice-whitelist behavior, so custom Speaches voice IDs like fettah failed.
The Speaches STT wiring assumed a websocket-style transport instead of using Speaches’ OpenAI-compatible HTTP transcription endpoint.
Proposed fix:

In pipecat:
pass Speaches custom TTS voices through unchanged
use OpenAI-compatible HTTP transcription behavior for Speaches STT
In Dograh:
stop rewriting the Speaches STT base URL to websocket form
preserve the configured STT language instead of forcing multi
I already prepared the code on my forks and can open PRs / share compare links.

Question for maintainers:

Is HTTP /v1/audio/transcriptions the preferred long-term Speaches STT integration, or would you prefer a Realtime-based adapter instead?

a6kme · 2026-04-06T06:52:02Z

a6kme
Apr 6, 2026
Maintainer

Hello @drascom - Thank you for trying out our Speaches integration. We intend to ship our own speaches fork which supports Websocket streaming for STT - https://github.com/dograh-hq/speaches. The idea is to have a full duplex Websocket based streaming of audio chunks to speaches server, so that we can implement server events like turn events and also interim transcripts. Along with websocket based streaming support for STT and TTS, we are also baking in vllm based runtime, which can support locally deployed LLMs in our fork of Speaches.

I realise that it might be confusing for someone who wants to try out the original speaches, so we might rename the intgration from speaches to something else, and add proper documentation.

This feature is something we have not yet fully baked in, and is evolving. Happy to hear your thoughts on what direction will make more sense to power users of Speaches or folks trying to host models locally.

EDIT: I see that you have already opened PRs. Let me review them and update here. Thanks!

1 reply

a6kme Apr 6, 2026
Maintainer

Hello @drascom - I have merged your PRs in the repository. Thanks for your contribution. This allows Dograh to play nicely with vanilla Speaches.

I will spend some more time thinking about how we can support the custom fork and build of Speaches within Dograh in order to evolve the roadmap of both the projects hand in hand.

Would love to hear if you can share any insights or thoughts on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dograh

Speaches compatibility fixes for custom TTS voices and STT transport #223

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dograh

Speaches compatibility fixes for custom TTS voices and STT transport #223

Uh oh!

Uh oh!

drascom Apr 4, 2026

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

a6kme Apr 6, 2026 Maintainer

Uh oh!

a6kme Apr 6, 2026 Maintainer

drascom
Apr 4, 2026

Replies: 1 comment 1 reply

a6kme
Apr 6, 2026
Maintainer

a6kme Apr 6, 2026
Maintainer