feat(stt): multi-provider STT with TranscriptionProvider trait by rareba · Pull Request #3614 · zeroclaw-labs/zeroclaw

rareba · 2026-03-15T17:40:06Z

Supersedes #2995 (branch prefix correction: feature/ -> feat/)

Summary

Base branch target: master
Problem: Transcription was hardcoded to a single Groq endpoint — no way to use alternative STT providers
Why it matters: Users need flexibility to choose STT providers based on accuracy, cost, or compliance requirements
What changed: Refactored single-endpoint Groq transcription into a multi-provider architecture with TranscriptionProvider trait. Implemented five STT providers: Groq (default, existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Added TranscriptionManager for provider routing and the transcribe_with_provider() method for explicit provider selection. Maintains full backward compatibility.
What did not change (scope boundary): Existing transcribe_audio() function signature unchanged. Existing config fields (api_url, model, api_key) and credential resolution (GROQ_API_KEY env fallback) preserved. Callers in telegram.rs, discord.rs, whatsapp_web.rs require no changes.

Files changed

src/channels/transcription.rs: Add TranscriptionProvider trait, five provider implementations, TranscriptionManager, shared validate_audio() helper, and parse_whisper_response() utility
src/config/schema.rs: Extend TranscriptionConfig with default_provider and optional sub-configs (OpenAiSttConfig, DeepgramSttConfig, AssemblyAiSttConfig, GoogleSttConfig); fix pre-existing sync_directory async/sync mismatch on non-unix platforms
src/config/mod.rs: Export new STT config types

Label Snapshot (required)

Risk label: risk: medium
Size label: size: L
Scope labels: channel, config
Module labels: channel: transcription
Contributor tier label: (auto-managed)
If any auto-label is incorrect, note requested correction: N/A

Change Metadata

Change type: feature
Primary scope: channel

Linked Issue

Closes feat(stt): multi-provider STT with TranscriptionProvider trait #2989

Supersede Attribution (required when `Supersedes #` is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check   # clean
cargo check   # passes (only pre-existing clippy warnings in unrelated files remain)
cargo test   # all 20 transcription unit tests pass (existing + new); config default, roundtrip, and without-transcription tests pass

Evidence provided: unit test results, config roundtrip tests
If any command is intentionally skipped, explain why: CI pipeline validation pending

Security Impact (required)

New permissions/capabilities? No
New external network calls? Yes — four new STT provider endpoints (OpenAI, Deepgram, AssemblyAI, Google)
Secrets/tokens handling changed? Yes — new API key fields for each provider sub-config
File system access scope changed? No
If any Yes, describe risk and mitigation: Each provider's API key is optional and config-gated. Provider sub-configs default to None. Audio validation occurs before any network call. Existing Groq credential resolution unchanged.

Privacy and Data Hygiene (required)

Data-hygiene status: pass
Redaction/anonymization notes: Audio data sent to external STT APIs for transcription — same privacy model as existing Groq path
Neutral wording confirmation: Confirmed

Compatibility / Migration

Backward compatible? Yes
Config/env changes? Yes — new optional default_provider field and provider sub-configs in [transcription] section (all default to None/Groq)
Migration needed? No — existing configs without new fields parse correctly and default to Groq

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? No — code changes only

Human Verification (required)

Verified scenarios: Provider trait implementation for all five providers, manager routing, backward-compatible function preservation, config roundtrip
Edge cases checked: Audio validation ordering (size/format errors before missing-key errors), missing config defaults to Groq, invalid audio formats
What was not verified: Live API calls to non-Groq providers (requires credentials)

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Transcription subsystem, config schema
Potential unintended effects: None — existing callers use unchanged transcribe_audio() function
Guardrails/monitoring for early detection: Audio validation runs before network calls; provider selection explicit

Agent Collaboration Notes (recommended)

Agent tools used: Claude Code
Workflow/plan summary: Extracted trait from existing Groq implementation, replicated pattern for four additional providers
Verification focus: Backward compatibility, config serde stability, test coverage
Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

Fast rollback command/path: git revert <commit>
Feature flags or config toggles: default_provider defaults to Groq; reverting preserves existing behavior
Observable failure symptoms: Non-Groq STT providers unavailable (Groq continues working)

Risks and Mitigations

Risk: New provider implementations untested against live APIs
- Mitigation: Unit tests validate request construction and response parsing; live testing deferred to integration phase
Risk: Config schema expansion could break existing config files
- Mitigation: All new fields have serde(default) — existing configs parse without changes

Summary by CodeRabbit

Release Notes

New Features
- Transcription now supports multiple providers: OpenAI Whisper, Deepgram, AssemblyAI, Google STT, and Groq
- Configure and select from multiple transcription providers based on your needs
- Improved audio validation with format normalization support
- Existing transcription configurations remain fully backward compatible

Refactors single-endpoint transcription to support multiple providers: Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Adds TranscriptionManager for provider routing with backward-compatible config fields.

…law-labs#3614) * feat(stt): add multi-provider STT with TranscriptionProvider trait Refactors single-endpoint transcription to support multiple providers: Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Adds TranscriptionManager for provider routing with backward-compatible config fields. * style: fix cargo fmt + clippy violations * fix: Box::pin large futures and resolve merge conflicts with master --------- Co-authored-by: argenis de la rosa <theonlyhennygod@gmail.com>

rareba requested review from JordanTheJet, SimianAstronaut7 and theonlyhennygod as code owners March 15, 2026 17:40

theonlyhennygod self-assigned this Mar 17, 2026

rareba and others added 3 commits March 17, 2026 00:27

style: fix cargo fmt + clippy violations

9804ee1

fix: Box::pin large futures and resolve merge conflicts with master

b5a4b42

theonlyhennygod force-pushed the feat/stt-multi-provider branch from 8186038 to b5a4b42 Compare March 17, 2026 04:33

theonlyhennygod merged commit b099728 into zeroclaw-labs:master Mar 17, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): multi-provider STT with TranscriptionProvider trait#3614

feat(stt): multi-provider STT with TranscriptionProvider trait#3614
theonlyhennygod merged 3 commits intozeroclaw-labs:masterfrom
rareba:feat/stt-multi-provider

rareba commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rareba commented Mar 15, 2026

Summary

Files changed

Label Snapshot (required)

Change Metadata

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Side Effects / Blast Radius (required)

Agent Collaboration Notes (recommended)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Release Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Supersede Attribution (required when `Supersedes #` is used)