feat(stt): multi-provider STT with TranscriptionProvider trait#3614
Merged
theonlyhennygod merged 3 commits intozeroclaw-labs:masterfrom Mar 17, 2026
Merged
feat(stt): multi-provider STT with TranscriptionProvider trait#3614theonlyhennygod merged 3 commits intozeroclaw-labs:masterfrom
theonlyhennygod merged 3 commits intozeroclaw-labs:masterfrom
Conversation
Refactors single-endpoint transcription to support multiple providers: Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Adds TranscriptionManager for provider routing with backward-compatible config fields.
8186038 to
b5a4b42
Compare
lantrinh1999
pushed a commit
to lantrinh1999/zeroclaw-1
that referenced
this pull request
Mar 18, 2026
…law-labs#3614) * feat(stt): add multi-provider STT with TranscriptionProvider trait Refactors single-endpoint transcription to support multiple providers: Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Adds TranscriptionManager for provider routing with backward-compatible config fields. * style: fix cargo fmt + clippy violations * fix: Box::pin large futures and resolve merge conflicts with master --------- Co-authored-by: argenis de la rosa <theonlyhennygod@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
masterTranscriptionProvidertrait. Implemented five STT providers: Groq (default, existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. AddedTranscriptionManagerfor provider routing and thetranscribe_with_provider()method for explicit provider selection. Maintains full backward compatibility.transcribe_audio()function signature unchanged. Existing config fields (api_url,model,api_key) and credential resolution (GROQ_API_KEYenv fallback) preserved. Callers in telegram.rs, discord.rs, whatsapp_web.rs require no changes.Files changed
src/channels/transcription.rs: AddTranscriptionProvidertrait, five provider implementations,TranscriptionManager, sharedvalidate_audio()helper, andparse_whisper_response()utilitysrc/config/schema.rs: ExtendTranscriptionConfigwithdefault_providerand optional sub-configs (OpenAiSttConfig,DeepgramSttConfig,AssemblyAiSttConfig,GoogleSttConfig); fix pre-existingsync_directoryasync/sync mismatch on non-unix platformssrc/config/mod.rs: Export new STT config typesLabel Snapshot (required)
risk: mediumsize: Lchannel,configchannel: transcriptionChange Metadata
featurechannelLinked Issue
Supersede Attribution (required when
Supersedes #is used)N/A
Validation Evidence (required)
Commands and result summary:
Security Impact (required)
Yes, describe risk and mitigation: Each provider's API key is optional and config-gated. Provider sub-configs default toNone. Audio validation occurs before any network call. Existing Groq credential resolution unchanged.Privacy and Data Hygiene (required)
passCompatibility / Migration
default_providerfield and provider sub-configs in[transcription]section (all default to None/Groq)i18n Follow-Through (required when docs or user-facing wording changes)
Human Verification (required)
Side Effects / Blast Radius (required)
transcribe_audio()functionAgent Collaboration Notes (recommended)
Rollback Plan (required)
git revert <commit>default_providerdefaults to Groq; reverting preserves existing behaviorRisks and Mitigations
serde(default)— existing configs parse without changesSummary by CodeRabbit
Release Notes