feat(stt): multi-provider STT with TranscriptionProvider trait by rareba · Pull Request #2995 · zeroclaw-labs/zeroclaw

rareba · 2026-03-08T09:35:38Z

Summary

Base branch target: master
Problem: Transcription was hardcoded to a single Groq endpoint — no way to use alternative STT providers
Why it matters: Users need flexibility to choose STT providers based on accuracy, cost, or compliance requirements
What changed: Refactored single-endpoint Groq transcription into a multi-provider architecture with TranscriptionProvider trait. Implemented five STT providers: Groq (default, existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Added TranscriptionManager for provider routing and the transcribe_with_provider() method for explicit provider selection. Maintains full backward compatibility.
What did not change (scope boundary): Existing transcribe_audio() function signature unchanged. Existing config fields (api_url, model, api_key) and credential resolution (GROQ_API_KEY env fallback) preserved. Callers in telegram.rs, discord.rs, whatsapp_web.rs require no changes.

Files changed

src/channels/transcription.rs: Add TranscriptionProvider trait, five provider implementations, TranscriptionManager, shared validate_audio() helper, and parse_whisper_response() utility
src/config/schema.rs: Extend TranscriptionConfig with default_provider and optional sub-configs (OpenAiSttConfig, DeepgramSttConfig, AssemblyAiSttConfig, GoogleSttConfig); fix pre-existing sync_directory async/sync mismatch on non-unix platforms
src/config/mod.rs: Export new STT config types

Label Snapshot (required)

Risk label: risk: medium
Size label: size: L
Scope labels: channel, config
Module labels: channel: transcription
Contributor tier label: (auto-managed)
If any auto-label is incorrect, note requested correction: N/A

Change Metadata

Change type: feature
Primary scope: channel

Linked Issue

Closes feat(stt): multi-provider STT with TranscriptionProvider trait #2989

Supersede Attribution (required when `Supersedes #` is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check   # clean
cargo check   # passes (only pre-existing clippy warnings in unrelated files remain)
cargo test   # all 20 transcription unit tests pass (existing + new); config default, roundtrip, and without-transcription tests pass

Evidence provided: unit test results, config roundtrip tests
If any command is intentionally skipped, explain why: CI pipeline validation pending

Security Impact (required)

New permissions/capabilities? No
New external network calls? Yes — four new STT provider endpoints (OpenAI, Deepgram, AssemblyAI, Google)
Secrets/tokens handling changed? Yes — new API key fields for each provider sub-config
File system access scope changed? No
If any Yes, describe risk and mitigation: Each provider's API key is optional and config-gated. Provider sub-configs default to None. Audio validation occurs before any network call. Existing Groq credential resolution unchanged.

Privacy and Data Hygiene (required)

Data-hygiene status: pass
Redaction/anonymization notes: Audio data sent to external STT APIs for transcription — same privacy model as existing Groq path
Neutral wording confirmation: Confirmed

Compatibility / Migration

Backward compatible? Yes
Config/env changes? Yes — new optional default_provider field and provider sub-configs in [transcription] section (all default to None/Groq)
Migration needed? No — existing configs without new fields parse correctly and default to Groq

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? No — code changes only

Human Verification (required)

Verified scenarios: Provider trait implementation for all five providers, manager routing, backward-compatible function preservation, config roundtrip
Edge cases checked: Audio validation ordering (size/format errors before missing-key errors), missing config defaults to Groq, invalid audio formats
What was not verified: Live API calls to non-Groq providers (requires credentials)

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Transcription subsystem, config schema
Potential unintended effects: None — existing callers use unchanged transcribe_audio() function
Guardrails/monitoring for early detection: Audio validation runs before network calls; provider selection explicit

Agent Collaboration Notes (recommended)

Agent tools used: Claude Code
Workflow/plan summary: Extracted trait from existing Groq implementation, replicated pattern for four additional providers
Verification focus: Backward compatibility, config serde stability, test coverage
Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

Fast rollback command/path: git revert <commit>
Feature flags or config toggles: default_provider defaults to Groq; reverting preserves existing behavior
Observable failure symptoms: Non-Groq STT providers unavailable (Groq continues working)

Risks and Mitigations

Risk: New provider implementations untested against live APIs
- Mitigation: Unit tests validate request construction and response parsing; live testing deferred to integration phase
Risk: Config schema expansion could break existing config files
- Mitigation: All new fields have serde(default) — existing configs parse without changes

Summary by CodeRabbit

Release Notes

New Features
- Transcription now supports multiple providers: OpenAI Whisper, Deepgram, AssemblyAI, Google STT, and Groq
- Configure and select from multiple transcription providers based on your needs
- Improved audio validation with format normalization support
- Existing transcription configurations remain fully backward compatible

coderabbitai · 2026-03-08T09:35:54Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)

master

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 53ce5155-d1ac-4ab2-a383-9d025314c338

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

`.coderabbit.yaml` has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)

Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

This PR implements a multi-provider speech-to-text architecture, replacing a single Groq endpoint with a pluggable TranscriptionProvider trait. Five provider implementations (Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google) are added, along with a TranscriptionManager for registration and routing. The TranscriptionConfig schema expanded to include provider-specific configuration blocks. A backward-compatible transcribe_audio() function defaults to Groq.

Changes

Cohort / File(s)	Summary
Multi-Provider Transcription System `src/channels/transcription.rs`	Added `TranscriptionProvider` trait and five provider implementations (Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google), each with `from_config()` and `transcribe()` methods. Introduced `TranscriptionManager` to register, query, and route transcription requests to providers. Added `validate_audio()` for file validation and normalization. Public `transcribe_audio()` function provides backward compatibility with Groq as default.
STT Configuration Schemas `src/config/schema.rs`	Extended `TranscriptionConfig` with `default_provider` field and optional provider-specific configs (`openai`, `deepgram`, `assemblyai`, `google`). Added new public structs: `OpenAiSttConfig`, `DeepgramSttConfig`, `AssemblyAiSttConfig`, `GoogleSttConfig` with provider-specific settings. Added default functions for provider and model defaults.
Config Module Exports `src/config/mod.rs`	Updated public re-exports to include new STT provider configuration types (`OpenAiSttConfig`, `DeepgramSttConfig`, `AssemblyAiSttConfig`, `GoogleSttConfig`).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant TranscriptionManager
    participant Provider A as OpenAI Whisper
    participant Provider B as Deepgram
    participant External API

    Client->>TranscriptionManager: transcribe(audio_data, filename)
    TranscriptionManager->>TranscriptionManager: Validate audio format/size
    alt Use Default Provider
        TranscriptionManager->>Provider A: transcribe(audio_data, filename)
    else Use Specific Provider
        TranscriptionManager->>Provider B: transcribe_with_provider(audio_data, filename, "deepgram")
    end
    Provider A->>External API: POST audio + metadata
    External API-->>Provider A: JSON response
    Provider A->>TranscriptionManager: Return transcribed text
    TranscriptionManager-->>Client: Transcription result

sequenceDiagram
    participant App
    participant GroqProvider
    participant GroqAPI as Groq API
    participant AudioFile as File System

    App->>GroqProvider: from_config(config)
    Note over GroqProvider: Resolve API key from config or env
    App->>GroqProvider: transcribe(audio_data, filename)
    GroqProvider->>AudioFile: Validate file extension & size
    GroqProvider->>GroqAPI: Multipart request (audio + metadata)
    GroqAPI-->>GroqProvider: Whisper JSON response
    GroqProvider->>GroqProvider: parse_whisper_response()
    GroqProvider-->>App: Transcribed text string

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: support config-level api_key for transcription #2112 — Updates transcription credential resolution flow and adds config-level API key handling with environment variable fallbacks.
fix(discord): transcribe inbound audio attachments #2700 — Integrates Discord channel code with the transcribe_audio public API and TranscriptionConfig.

Suggested labels

size: XL, risk: high, provider, config: core, channel: transcription, tests, core

Suggested reviewers

theonlyhennygod
chumyin

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and accurately summarizes the main change: introducing a multi-provider STT architecture with TranscriptionProvider trait.
Description check	✅ Passed	The PR description provides a clear summary, lists changed files with details, confirms backward compatibility, includes test results, and documents risk/rollback. Most required template sections are addressed substantively.
Linked Issues check	✅ Passed	The PR meets core coding requirements from `#2989`: implements TranscriptionProvider trait with async transcribe and name(), provides five provider implementations (Groq, OpenAI, Deepgram, AssemblyAI, Google), adds TranscriptionManager for routing, and extends config with provider selection.
Out of Scope Changes check	✅ Passed	All code changes are directly aligned with the issue objectives: provider trait, five implementations, manager, config extension, and helper functions. The sync_directory fix is a pre-existing bug fix within scope of the config refactoring.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

src/channels/transcription.rs (1)

90-539: Consider moving the concrete STT providers into src/providers/.

These sections put five provider-specific HTTP/JSON integrations under src/channels/. Keeping only the routing/entrypoint here and moving concrete backends into src/providers/ would keep the channel boundary smaller as more providers land.

As per coding guidelines, "Keep module responsibilities single-purpose: orchestration in agent/, transport in channels/, model I/O in providers/."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 90 - 539, The channel module
currently contains five concrete STT providers (structs GroqProvider,
OpenAiWhisperProvider, DeepgramProvider, AssemblyAiProvider, GoogleSttProvider
and their impls of TranscriptionProvider including methods from_config,
transcribe, name, supported_formats) which should be moved into a providers
module to keep channels focused on transport/orchestration. Create a new
providers module for these concrete implementations and move the provider
structs and their TranscriptionProvider impls there; keep the
TranscriptionProvider trait, validate_audio, parse_whisper_response, and any
shared helpers in the channels module (or make them public) so the moved
providers can call validate_audio, parse_whisper_response, and
crate::config::build_runtime_proxy_client unchanged; update imports/uses in the
channel to re-export or construct providers via their from_config constructors
and ensure method names (from_config, transcribe, name, supported_formats) and
referenced symbols (e.g., TRANSCRIPTION_TIMEOUT_SECS, parse_whisper_response)
remain reachable after the move.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/transcription.rs`:
- Around line 468-492: The transcribe() function currently ignores the
normalized filename returned by validate_audio(); change the let (_, _) =
validate_audio(...) call to capture the normalized filename (e.g., let (_,
normalized_name) = validate_audio(...)?), use normalized_name when extracting
the extension to set the encoding variable, and ensure you validate that the
resulting extension is one of supported_formats() (return an error if not) so
formats like .oga, .mp4, .m4a, .mpga don't fall through to
"ENCODING_UNSPECIFIED".
- Around line 581-617: The constructor pub fn new(config: &TranscriptionConfig)
currently swallows provider init errors (calls like GroqProvider::from_config,
OpenAiWhisperProvider::from_config, DeepgramProvider::from_config,
AssemblyAiProvider::from_config, GoogleSttProvider::from_config) and still
returns Ok with config.default_provider set, which hides root causes when the
chosen default failed to initialize; change the logic to surface errors: capture
and propagate the concrete error when a provider factory fails (or at minimum
record failures and, after attempting all providers, if
config.default_provider.is_some() but providers does not contain that key,
return an Err/bail! with a clear message that includes the original provider
error(s)); ensure the new() function returns an error instead of silently
succeeding whenever the configured default provider failed to initialize.
- Around line 392-433: The poll loop using poll_interval, max_polls and a
per-request timeout can exceed the intended ~3 minute cap because each .get()
can await up to timeout; fix by enforcing a total deadline (e.g., record start =
Instant::now() and check elapsed against Duration::from_secs(180) or wrap the
whole polling loop in tokio::time::timeout) and break with a timeout error when
exceeded, and also stop treating non-2xx responses as silent "unknown" — inspect
poll_resp.status(), propagate non-success statuses with the response body or
status code instead of using poll_body["status"].as_str().unwrap_or("unknown"),
and ensure error branches (the "error" match and non-2xx cases) return
informative messages so polling does not continue indefinitely.

In `@src/config/schema.rs`:
- Around line 696-727: Add validation for the transcription section inside
Config::validate(): inspect self.transcription.default_provider (trimmed) and
match it against the allowed values
("groq","openai","deepgram","assemblyai","google"); for each provider require
the corresponding fields are present and non-empty (e.g., for "groq" ensure
self.transcription.model is non-empty, for "openai" ensure
self.transcription.openai is Some and openai.model is non-empty, for "deepgram"
require self.transcription.deepgram Some and necessary fields non-empty, for
"google" require self.transcription.google Some and google.language_code
non-empty, assemblyai can just be allowed) and call anyhow::bail! with clear
messages on missing/empty values or on an unsupported default_provider value so
config loading fails fast.
- Around line 748-787: The STT provider api_key fields (OpenAiSttConfig.api_key,
DeepgramSttConfig.api_key, AssemblyAiSttConfig.api_key, GoogleSttConfig.api_key)
are not wired into the secret-store flow; update the config load/save wiring to
call decrypt_optional_secret(&store, &mut <provider>.api_key,
"config.transcription.<provider>.api_key") during load and
encrypt_optional_secret(&store, &<provider>.api_key,
"config.transcription.<provider>.api_key") during save for each optional
provider present (config.transcription.openai, .deepgram, .assemblyai, .google)
using the same pattern as the existing transcription.api_key handling; add a
regression test that saves a config with nested STT keys and asserts they are
encrypted on disk and correctly decrypted on reload.

---

Nitpick comments:
In `@src/channels/transcription.rs`:
- Around line 90-539: The channel module currently contains five concrete STT
providers (structs GroqProvider, OpenAiWhisperProvider, DeepgramProvider,
AssemblyAiProvider, GoogleSttProvider and their impls of TranscriptionProvider
including methods from_config, transcribe, name, supported_formats) which should
be moved into a providers module to keep channels focused on
transport/orchestration. Create a new providers module for these concrete
implementations and move the provider structs and their TranscriptionProvider
impls there; keep the TranscriptionProvider trait, validate_audio,
parse_whisper_response, and any shared helpers in the channels module (or make
them public) so the moved providers can call validate_audio,
parse_whisper_response, and crate::config::build_runtime_proxy_client unchanged;
update imports/uses in the channel to re-export or construct providers via their
from_config constructors and ensure method names (from_config, transcribe, name,
supported_formats) and referenced symbols (e.g., TRANSCRIPTION_TIMEOUT_SECS,
parse_whisper_response) remain reachable after the move.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e12dca1-c69d-4e51-b77b-7918656c3fdf

📥 Commits

Reviewing files that changed from the base of the PR and between 326b60d and 6355c49.

📒 Files selected for processing (3)

src/channels/transcription.rs
src/config/mod.rs
src/config/schema.rs

src/channels/transcription.rs

coderabbitai · 2026-03-08T09:46:34Z

src/channels/transcription.rs

+    pub fn new(config: &TranscriptionConfig) -> Result<Self> {
+        let mut providers: HashMap<String, Box<dyn TranscriptionProvider>> = HashMap::new();
+
+        if let Ok(groq) = GroqProvider::from_config(config) {
+            providers.insert("groq".to_string(), Box::new(groq));
+        }
+
+        if let Some(ref openai_cfg) = config.openai {
+            if let Ok(p) = OpenAiWhisperProvider::from_config(openai_cfg) {
+                providers.insert("openai".to_string(), Box::new(p));
+            }
+        }
+
+        if let Some(ref deepgram_cfg) = config.deepgram {
+            if let Ok(p) = DeepgramProvider::from_config(deepgram_cfg) {
+                providers.insert("deepgram".to_string(), Box::new(p));
+            }
+        }
+
+        if let Some(ref assemblyai_cfg) = config.assemblyai {
+            if let Ok(p) = AssemblyAiProvider::from_config(assemblyai_cfg) {
+                providers.insert("assemblyai".to_string(), Box::new(p));
+            }
+        }
+
+        if let Some(ref google_cfg) = config.google {
+            if let Ok(p) = GoogleSttProvider::from_config(google_cfg) {
+                providers.insert("google".to_string(), Box::new(p));
+            }
+        }
+
+        let default_provider = config.default_provider.clone();
+
+        Ok(Self {
+            providers,
+            default_provider,
+        })


⚠️ Potential issue | 🟠 Major

Don’t drop the root-cause error for the selected default provider.

Lines 584-609 silently discard provider init failures, and Line 612 still stores config.default_provider. With transcription enabled and a missing key for the chosen default, TranscriptionManager::new() succeeds and the first transcribe() call only says “not configured”.

🔧 Suggested fix

let default_provider = config.default_provider.clone(); + + if config.enabled && !providers.contains_key(&default_provider) { + let available: Vec<&str> = providers.keys().map(|k| k.as_str()).collect(); + bail!( + "Default transcription provider '{}' is not configured. Available: {available:?}", + default_provider + ); + } Ok(Self { providers, default_provider, })

As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities; document fallback behavior when fallback is intentional and safe."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/channels/transcription.rs` around lines 581 - 617, The constructor pub fn new(config: &TranscriptionConfig) currently swallows provider init errors (calls like GroqProvider::from_config, OpenAiWhisperProvider::from_config, DeepgramProvider::from_config, AssemblyAiProvider::from_config, GoogleSttProvider::from_config) and still returns Ok with config.default_provider set, which hides root causes when the chosen default failed to initialize; change the logic to surface errors: capture and propagate the concrete error when a provider factory fails (or at minimum record failures and, after attempting all providers, if config.default_provider.is_some() but providers does not contain that key, return an Err/bail! with a clear message that includes the original provider error(s)); ensure the new() function returns an error instead of silently succeeding whenever the configured default provider failed to initialize.

coderabbitai · 2026-03-08T09:46:34Z

src/config/schema.rs

+    /// Default STT provider: "groq", "openai", "deepgram", "assemblyai", "google".
+    #[serde(default = "default_transcription_provider")]
+    pub default_provider: String,
+    /// API key used for transcription requests (Groq provider).
    ///
    /// If unset, runtime falls back to `GROQ_API_KEY` for backward compatibility.
    #[serde(default)]
    pub api_key: Option<String>,
-    /// Whisper API endpoint URL.
+    /// Whisper API endpoint URL (Groq provider).
    #[serde(default = "default_transcription_api_url")]
    pub api_url: String,
-    /// Whisper model name.
+    /// Whisper model name (Groq provider).
    #[serde(default = "default_transcription_model")]
    pub model: String,
-    /// Optional language hint (ISO-639-1, e.g. "en", "ru").
+    /// Optional language hint (ISO-639-1, e.g. "en", "ru") for Groq provider.
    #[serde(default)]
    pub language: Option<String>,
    /// Maximum voice duration in seconds (messages longer than this are skipped).
    #[serde(default = "default_transcription_max_duration_secs")]
    pub max_duration_secs: u64,
+    /// OpenAI Whisper STT provider configuration.
+    #[serde(default)]
+    pub openai: Option<OpenAiSttConfig>,
+    /// Deepgram STT provider configuration.
+    #[serde(default)]
+    pub deepgram: Option<DeepgramSttConfig>,
+    /// AssemblyAI STT provider configuration.
+    #[serde(default)]
+    pub assemblyai: Option<AssemblyAiSttConfig>,
+    /// Google Cloud Speech-to-Text provider configuration.
+    #[serde(default)]
+    pub google: Option<GoogleSttConfig>,


⚠️ Potential issue | 🟡 Minor

Validate transcription.default_provider at config load time.

Config::validate() never inspects the transcription section, so typos like "deppgram" or an empty model in the selected provider block are accepted until the first audio message hits the STT path. This should fail fast during config validation instead of surfacing as a delayed runtime error.

🧪 Suggested validation shape

match self.transcription.default_provider.trim() { "groq" => { if self.transcription.model.trim().is_empty() { anyhow::bail!("transcription.model must not be empty when default_provider=groq"); } } "openai" => { let cfg = self .transcription .openai .as_ref() .ok_or_else(|| anyhow::anyhow!("transcription.openai is required when default_provider=openai"))?; if cfg.model.trim().is_empty() { anyhow::bail!("transcription.openai.model must not be empty"); } } "deepgram" => { /* same pattern */ } "assemblyai" => {} "google" => { let cfg = self .transcription .google .as_ref() .ok_or_else(|| anyhow::anyhow!("transcription.google is required when default_provider=google"))?; if cfg.language_code.trim().is_empty() { anyhow::bail!("transcription.google.language_code must not be empty"); } } other => anyhow::bail!( "transcription.default_provider must be one of: groq, openai, deepgram, assemblyai, google (got {other})" ), }

As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; keep error paths obvious and localized."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 696 - 727, Add validation for the transcription section inside Config::validate(): inspect self.transcription.default_provider (trimmed) and match it against the allowed values ("groq","openai","deepgram","assemblyai","google"); for each provider require the corresponding fields are present and non-empty (e.g., for "groq" ensure self.transcription.model is non-empty, for "openai" ensure self.transcription.openai is Some and openai.model is non-empty, for "deepgram" require self.transcription.deepgram Some and necessary fields non-empty, for "google" require self.transcription.google Some and google.language_code non-empty, assemblyai can just be allowed) and call anyhow::bail! with clear messages on missing/empty values or on an unsupported default_provider value so config loading fails fast.

src/config/schema.rs

rikitrader · 2026-03-09T03:11:40Z

Thank you for the contribution! We appreciate the effort.

We are closing this PR because it introduces a standalone module with no integration into the existing codebase — the new types/functions are not called by any existing code path. ZeroClaw follows a trait-driven architecture where new features must be wired through factory registration and have at least one active caller.

To reopen, please:

Wire the feature into an existing subsystem (channel, provider, tool factory, or agent loop)
Add integration tests that exercise the end-to-end flow
Keep PR scope focused — one feature per PR, ideally under 500 lines

We welcome smaller, integrated contributions. See CLAUDE.md §7 for playbooks on adding providers, channels, tools, and peripherals.

rikitrader · 2026-03-09T03:12:15Z

Review: This PR introduces a standalone module that is not wired into any existing code path. The new types/functions have no callers in the codebase.

To make this mergeable:

Wire into an existing subsystem (channel, provider, tool factory, or agent loop)
Add integration tests exercising the end-to-end flow
Keep scope under 500 lines per PR

See CLAUDE.md §7 for playbooks. Recommend closing and resubmitting as smaller, integrated PRs.

rareba · 2026-03-09T07:58:42Z

Rebased onto current master. The PR diff is now 3 files / 936 lines (was 937 files / 198K lines due to divergent base).

What changed:

Clean branch from master with only the multi-provider STT feature
TranscriptionProvider trait with implementations for Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google
Factory function create_transcription_provider() with config-gated provider selection
Config structs for each provider with per-provider credentials
Already wired into Telegram and Discord voice message handlers via .with_transcription()
cargo check clean

Refactors single-endpoint transcription to support multiple providers: Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Adds TranscriptionManager for provider routing with backward-compatible config fields.

rareba · 2026-03-15T17:40:04Z

Superseded: reopening from feat/stt-multi-provider branch (corrected prefix per CONTRIBUTING.md).

rareba requested review from JordanTheJet and theonlyhennygod as code owners March 8, 2026 09:35

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

rareba force-pushed the feature/stt-multi-provider branch from 94993a5 to 742c95f Compare March 9, 2026 07:58

rareba changed the base branch from main to master March 9, 2026 20:52

This was referenced Mar 10, 2026

🦞 OpenClaw 生态日报 2026-03-10 duanyytop/agents-radar#119

Open

🦞 OpenClaw Ecosystem Digest 2026-03-10 duanyytop/agents-radar#120

Open

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-10 compasify/agents-radar#21

Open

rareba force-pushed the feature/stt-multi-provider branch from 4a04704 to 7877a22 Compare March 12, 2026 08:11

rareba requested a review from SimianAstronaut7 as a code owner March 12, 2026 08:11

rareba added 2 commits March 15, 2026 15:52

style: fix cargo fmt + clippy violations

184df98

rareba force-pushed the feature/stt-multi-provider branch from 5bf6523 to 184df98 Compare March 15, 2026 14:55

fix(cron): Box::pin large futures in execute_job_with_retry calls

8186038

rareba closed this Mar 15, 2026

rareba mentioned this pull request Mar 15, 2026

feat(stt): multi-provider STT with TranscriptionProvider trait #3614

Merged

Conversation

rareba commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Label Snapshot (required)

Change Metadata

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Side Effects / Blast Radius (required)

Agent Collaboration Notes (recommended)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

.coderabbit.yaml has unrecognized properties

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rikitrader commented Mar 9, 2026

Uh oh!

rikitrader commented Mar 9, 2026

Uh oh!

rareba commented Mar 9, 2026

Uh oh!

rareba commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rareba commented Mar 8, 2026 •

edited

Loading

Supersede Attribution (required when `Supersedes #` is used)

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties