Replies: 1 comment 1 reply
-
|
Hello @drascom - Thank you for trying out our Speaches integration. We intend to ship our own speaches fork which supports Websocket streaming for STT - https://github.com/dograh-hq/speaches. The idea is to have a full duplex Websocket based streaming of audio chunks to speaches server, so that we can implement server events like turn events and also interim transcripts. Along with websocket based streaming support for STT and TTS, we are also baking in vllm based runtime, which can support locally deployed LLMs in our fork of Speaches. I realise that it might be confusing for someone who wants to try out the original speaches, so we might rename the intgration from speaches to something else, and add proper documentation. This feature is something we have not yet fully baked in, and is evolving. Happy to hear your thoughts on what direction will make more sense to power users of Speaches or folks trying to host models locally. EDIT: I see that you have already opened PRs. Let me review them and update here. Thanks! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We hit two integration issues while using Dograh with a local Speaches server.
Setup:
STT provider: speaches
STT model: Systran/faster-whisper-small
TTS provider: speaches
TTS model: speaches-ai/piper-tr_TR-fettah-medium
TTS voice: fettah
Speaches base URL: http://localhost:9100/v1
Repro:
STT works directly against Speaches with:
TTS works directly against Speaches with:
Issue seen in Dograh:
Pipeline error: Error processing frame: 'fettah'
Root causes:
The Speaches TTS adapter was inheriting OpenAI voice-whitelist behavior, so custom Speaches voice IDs like fettah failed.
The Speaches STT wiring assumed a websocket-style transport instead of using Speaches’ OpenAI-compatible HTTP transcription endpoint.
Proposed fix:
In pipecat:
pass Speaches custom TTS voices through unchanged
use OpenAI-compatible HTTP transcription behavior for Speaches STT
In Dograh:
stop rewriting the Speaches STT base URL to websocket form
preserve the configured STT language instead of forcing multi
I already prepared the code on my forks and can open PRs / share compare links.
Question for maintainers:
Is HTTP /v1/audio/transcriptions the preferred long-term Speaches STT integration, or would you prefer a Realtime-based adapter instead?
Beta Was this translation helpful? Give feedback.
All reactions