Skip to content

feat(stt): multi-provider STT with TranscriptionProvider trait#2995

Closed
rareba wants to merge 3 commits intozeroclaw-labs:masterfrom
rareba:feature/stt-multi-provider
Closed

feat(stt): multi-provider STT with TranscriptionProvider trait#2995
rareba wants to merge 3 commits intozeroclaw-labs:masterfrom
rareba:feature/stt-multi-provider

Conversation

@rareba
Copy link
Contributor

@rareba rareba commented Mar 8, 2026

Summary

  • Base branch target: master
  • Problem: Transcription was hardcoded to a single Groq endpoint — no way to use alternative STT providers
  • Why it matters: Users need flexibility to choose STT providers based on accuracy, cost, or compliance requirements
  • What changed: Refactored single-endpoint Groq transcription into a multi-provider architecture with TranscriptionProvider trait. Implemented five STT providers: Groq (default, existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud Speech-to-Text. Added TranscriptionManager for provider routing and the transcribe_with_provider() method for explicit provider selection. Maintains full backward compatibility.
  • What did not change (scope boundary): Existing transcribe_audio() function signature unchanged. Existing config fields (api_url, model, api_key) and credential resolution (GROQ_API_KEY env fallback) preserved. Callers in telegram.rs, discord.rs, whatsapp_web.rs require no changes.

Files changed

  • src/channels/transcription.rs: Add TranscriptionProvider trait, five provider implementations, TranscriptionManager, shared validate_audio() helper, and parse_whisper_response() utility
  • src/config/schema.rs: Extend TranscriptionConfig with default_provider and optional sub-configs (OpenAiSttConfig, DeepgramSttConfig, AssemblyAiSttConfig, GoogleSttConfig); fix pre-existing sync_directory async/sync mismatch on non-unix platforms
  • src/config/mod.rs: Export new STT config types

Label Snapshot (required)

  • Risk label: risk: medium
  • Size label: size: L
  • Scope labels: channel, config
  • Module labels: channel: transcription
  • Contributor tier label: (auto-managed)
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type: feature
  • Primary scope: channel

Linked Issue

Supersede Attribution (required when Supersedes # is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check   # clean
cargo check   # passes (only pre-existing clippy warnings in unrelated files remain)
cargo test   # all 20 transcription unit tests pass (existing + new); config default, roundtrip, and without-transcription tests pass
  • Evidence provided: unit test results, config roundtrip tests
  • If any command is intentionally skipped, explain why: CI pipeline validation pending

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? Yes — four new STT provider endpoints (OpenAI, Deepgram, AssemblyAI, Google)
  • Secrets/tokens handling changed? Yes — new API key fields for each provider sub-config
  • File system access scope changed? No
  • If any Yes, describe risk and mitigation: Each provider's API key is optional and config-gated. Provider sub-configs default to None. Audio validation occurs before any network call. Existing Groq credential resolution unchanged.

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: Audio data sent to external STT APIs for transcription — same privacy model as existing Groq path
  • Neutral wording confirmation: Confirmed

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes — new optional default_provider field and provider sub-configs in [transcription] section (all default to None/Groq)
  • Migration needed? No — existing configs without new fields parse correctly and default to Groq

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No — code changes only

Human Verification (required)

  • Verified scenarios: Provider trait implementation for all five providers, manager routing, backward-compatible function preservation, config roundtrip
  • Edge cases checked: Audio validation ordering (size/format errors before missing-key errors), missing config defaults to Groq, invalid audio formats
  • What was not verified: Live API calls to non-Groq providers (requires credentials)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Transcription subsystem, config schema
  • Potential unintended effects: None — existing callers use unchanged transcribe_audio() function
  • Guardrails/monitoring for early detection: Audio validation runs before network calls; provider selection explicit

Agent Collaboration Notes (recommended)

  • Agent tools used: Claude Code
  • Workflow/plan summary: Extracted trait from existing Groq implementation, replicated pattern for four additional providers
  • Verification focus: Backward compatibility, config serde stability, test coverage
  • Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

  • Fast rollback command/path: git revert <commit>
  • Feature flags or config toggles: default_provider defaults to Groq; reverting preserves existing behavior
  • Observable failure symptoms: Non-Groq STT providers unavailable (Groq continues working)

Risks and Mitigations

  • Risk: New provider implementations untested against live APIs
    • Mitigation: Unit tests validate request construction and response parsing; live testing deferred to integration phase
  • Risk: Config schema expansion could break existing config files
    • Mitigation: All new fields have serde(default) — existing configs parse without changes

Summary by CodeRabbit

Release Notes

  • New Features
    • Transcription now supports multiple providers: OpenAI Whisper, Deepgram, AssemblyAI, Google STT, and Groq
    • Configure and select from multiple transcription providers based on your needs
    • Improved audio validation with format normalization support
    • Existing transcription configurations remain fully backward compatible

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • master

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 53ce5155-d1ac-4ab2-a383-9d025314c338

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This PR implements a multi-provider speech-to-text architecture, replacing a single Groq endpoint with a pluggable TranscriptionProvider trait. Five provider implementations (Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google) are added, along with a TranscriptionManager for registration and routing. The TranscriptionConfig schema expanded to include provider-specific configuration blocks. A backward-compatible transcribe_audio() function defaults to Groq.

Changes

Cohort / File(s) Summary
Multi-Provider Transcription System
src/channels/transcription.rs
Added TranscriptionProvider trait and five provider implementations (Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google), each with from_config() and transcribe() methods. Introduced TranscriptionManager to register, query, and route transcription requests to providers. Added validate_audio() for file validation and normalization. Public transcribe_audio() function provides backward compatibility with Groq as default.
STT Configuration Schemas
src/config/schema.rs
Extended TranscriptionConfig with default_provider field and optional provider-specific configs (openai, deepgram, assemblyai, google). Added new public structs: OpenAiSttConfig, DeepgramSttConfig, AssemblyAiSttConfig, GoogleSttConfig with provider-specific settings. Added default functions for provider and model defaults.
Config Module Exports
src/config/mod.rs
Updated public re-exports to include new STT provider configuration types (OpenAiSttConfig, DeepgramSttConfig, AssemblyAiSttConfig, GoogleSttConfig).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant TranscriptionManager
    participant Provider A as OpenAI Whisper
    participant Provider B as Deepgram
    participant External API

    Client->>TranscriptionManager: transcribe(audio_data, filename)
    TranscriptionManager->>TranscriptionManager: Validate audio format/size
    alt Use Default Provider
        TranscriptionManager->>Provider A: transcribe(audio_data, filename)
    else Use Specific Provider
        TranscriptionManager->>Provider B: transcribe_with_provider(audio_data, filename, "deepgram")
    end
    Provider A->>External API: POST audio + metadata
    External API-->>Provider A: JSON response
    Provider A->>TranscriptionManager: Return transcribed text
    TranscriptionManager-->>Client: Transcription result
Loading
sequenceDiagram
    participant App
    participant GroqProvider
    participant GroqAPI as Groq API
    participant AudioFile as File System

    App->>GroqProvider: from_config(config)
    Note over GroqProvider: Resolve API key from config or env
    App->>GroqProvider: transcribe(audio_data, filename)
    GroqProvider->>AudioFile: Validate file extension & size
    GroqProvider->>GroqAPI: Multipart request (audio + metadata)
    GroqAPI-->>GroqProvider: Whisper JSON response
    GroqProvider->>GroqProvider: parse_whisper_response()
    GroqProvider-->>App: Transcribed text string
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

size: XL, risk: high, provider, config: core, channel: transcription, tests, core

Suggested reviewers

  • theonlyhennygod
  • chumyin
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: introducing a multi-provider STT architecture with TranscriptionProvider trait.
Description check ✅ Passed The PR description provides a clear summary, lists changed files with details, confirms backward compatibility, includes test results, and documents risk/rollback. Most required template sections are addressed substantively.
Linked Issues check ✅ Passed The PR meets core coding requirements from #2989: implements TranscriptionProvider trait with async transcribe and name(), provides five provider implementations (Groq, OpenAI, Deepgram, AssemblyAI, Google), adds TranscriptionManager for routing, and extends config with provider selection.
Out of Scope Changes check ✅ Passed All code changes are directly aligned with the issue objectives: provider trait, five implementations, manager, config extension, and helper functions. The sync_directory fix is a pre-existing bug fix within scope of the config refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/channels/transcription.rs (1)

90-539: Consider moving the concrete STT providers into src/providers/.

These sections put five provider-specific HTTP/JSON integrations under src/channels/. Keeping only the routing/entrypoint here and moving concrete backends into src/providers/ would keep the channel boundary smaller as more providers land.

As per coding guidelines, "Keep module responsibilities single-purpose: orchestration in agent/, transport in channels/, model I/O in providers/."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 90 - 539, The channel module
currently contains five concrete STT providers (structs GroqProvider,
OpenAiWhisperProvider, DeepgramProvider, AssemblyAiProvider, GoogleSttProvider
and their impls of TranscriptionProvider including methods from_config,
transcribe, name, supported_formats) which should be moved into a providers
module to keep channels focused on transport/orchestration. Create a new
providers module for these concrete implementations and move the provider
structs and their TranscriptionProvider impls there; keep the
TranscriptionProvider trait, validate_audio, parse_whisper_response, and any
shared helpers in the channels module (or make them public) so the moved
providers can call validate_audio, parse_whisper_response, and
crate::config::build_runtime_proxy_client unchanged; update imports/uses in the
channel to re-export or construct providers via their from_config constructors
and ensure method names (from_config, transcribe, name, supported_formats) and
referenced symbols (e.g., TRANSCRIPTION_TIMEOUT_SECS, parse_whisper_response)
remain reachable after the move.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/transcription.rs`:
- Around line 468-492: The transcribe() function currently ignores the
normalized filename returned by validate_audio(); change the let (_, _) =
validate_audio(...) call to capture the normalized filename (e.g., let (_,
normalized_name) = validate_audio(...)?), use normalized_name when extracting
the extension to set the encoding variable, and ensure you validate that the
resulting extension is one of supported_formats() (return an error if not) so
formats like .oga, .mp4, .m4a, .mpga don't fall through to
"ENCODING_UNSPECIFIED".
- Around line 581-617: The constructor pub fn new(config: &TranscriptionConfig)
currently swallows provider init errors (calls like GroqProvider::from_config,
OpenAiWhisperProvider::from_config, DeepgramProvider::from_config,
AssemblyAiProvider::from_config, GoogleSttProvider::from_config) and still
returns Ok with config.default_provider set, which hides root causes when the
chosen default failed to initialize; change the logic to surface errors: capture
and propagate the concrete error when a provider factory fails (or at minimum
record failures and, after attempting all providers, if
config.default_provider.is_some() but providers does not contain that key,
return an Err/bail! with a clear message that includes the original provider
error(s)); ensure the new() function returns an error instead of silently
succeeding whenever the configured default provider failed to initialize.
- Around line 392-433: The poll loop using poll_interval, max_polls and a
per-request timeout can exceed the intended ~3 minute cap because each .get()
can await up to timeout; fix by enforcing a total deadline (e.g., record start =
Instant::now() and check elapsed against Duration::from_secs(180) or wrap the
whole polling loop in tokio::time::timeout) and break with a timeout error when
exceeded, and also stop treating non-2xx responses as silent "unknown" — inspect
poll_resp.status(), propagate non-success statuses with the response body or
status code instead of using poll_body["status"].as_str().unwrap_or("unknown"),
and ensure error branches (the "error" match and non-2xx cases) return
informative messages so polling does not continue indefinitely.

In `@src/config/schema.rs`:
- Around line 696-727: Add validation for the transcription section inside
Config::validate(): inspect self.transcription.default_provider (trimmed) and
match it against the allowed values
("groq","openai","deepgram","assemblyai","google"); for each provider require
the corresponding fields are present and non-empty (e.g., for "groq" ensure
self.transcription.model is non-empty, for "openai" ensure
self.transcription.openai is Some and openai.model is non-empty, for "deepgram"
require self.transcription.deepgram Some and necessary fields non-empty, for
"google" require self.transcription.google Some and google.language_code
non-empty, assemblyai can just be allowed) and call anyhow::bail! with clear
messages on missing/empty values or on an unsupported default_provider value so
config loading fails fast.
- Around line 748-787: The STT provider api_key fields (OpenAiSttConfig.api_key,
DeepgramSttConfig.api_key, AssemblyAiSttConfig.api_key, GoogleSttConfig.api_key)
are not wired into the secret-store flow; update the config load/save wiring to
call decrypt_optional_secret(&store, &mut <provider>.api_key,
"config.transcription.<provider>.api_key") during load and
encrypt_optional_secret(&store, &<provider>.api_key,
"config.transcription.<provider>.api_key") during save for each optional
provider present (config.transcription.openai, .deepgram, .assemblyai, .google)
using the same pattern as the existing transcription.api_key handling; add a
regression test that saves a config with nested STT keys and asserts they are
encrypted on disk and correctly decrypted on reload.

---

Nitpick comments:
In `@src/channels/transcription.rs`:
- Around line 90-539: The channel module currently contains five concrete STT
providers (structs GroqProvider, OpenAiWhisperProvider, DeepgramProvider,
AssemblyAiProvider, GoogleSttProvider and their impls of TranscriptionProvider
including methods from_config, transcribe, name, supported_formats) which should
be moved into a providers module to keep channels focused on
transport/orchestration. Create a new providers module for these concrete
implementations and move the provider structs and their TranscriptionProvider
impls there; keep the TranscriptionProvider trait, validate_audio,
parse_whisper_response, and any shared helpers in the channels module (or make
them public) so the moved providers can call validate_audio,
parse_whisper_response, and crate::config::build_runtime_proxy_client unchanged;
update imports/uses in the channel to re-export or construct providers via their
from_config constructors and ensure method names (from_config, transcribe, name,
supported_formats) and referenced symbols (e.g., TRANSCRIPTION_TIMEOUT_SECS,
parse_whisper_response) remain reachable after the move.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e12dca1-c69d-4e51-b77b-7918656c3fdf

📥 Commits

Reviewing files that changed from the base of the PR and between 326b60d and 6355c49.

📒 Files selected for processing (3)
  • src/channels/transcription.rs
  • src/config/mod.rs
  • src/config/schema.rs

Comment on lines +581 to +617
pub fn new(config: &TranscriptionConfig) -> Result<Self> {
let mut providers: HashMap<String, Box<dyn TranscriptionProvider>> = HashMap::new();

if let Ok(groq) = GroqProvider::from_config(config) {
providers.insert("groq".to_string(), Box::new(groq));
}

if let Some(ref openai_cfg) = config.openai {
if let Ok(p) = OpenAiWhisperProvider::from_config(openai_cfg) {
providers.insert("openai".to_string(), Box::new(p));
}
}

if let Some(ref deepgram_cfg) = config.deepgram {
if let Ok(p) = DeepgramProvider::from_config(deepgram_cfg) {
providers.insert("deepgram".to_string(), Box::new(p));
}
}

if let Some(ref assemblyai_cfg) = config.assemblyai {
if let Ok(p) = AssemblyAiProvider::from_config(assemblyai_cfg) {
providers.insert("assemblyai".to_string(), Box::new(p));
}
}

if let Some(ref google_cfg) = config.google {
if let Ok(p) = GoogleSttProvider::from_config(google_cfg) {
providers.insert("google".to_string(), Box::new(p));
}
}

let default_provider = config.default_provider.clone();

Ok(Self {
providers,
default_provider,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop the root-cause error for the selected default provider.

Lines 584-609 silently discard provider init failures, and Line 612 still stores config.default_provider. With transcription enabled and a missing key for the chosen default, TranscriptionManager::new() succeeds and the first transcribe() call only says “not configured”.

🔧 Suggested fix
         let default_provider = config.default_provider.clone();
+
+        if config.enabled && !providers.contains_key(&default_provider) {
+            let available: Vec<&str> = providers.keys().map(|k| k.as_str()).collect();
+            bail!(
+                "Default transcription provider '{}' is not configured. Available: {available:?}",
+                default_provider
+            );
+        }
 
         Ok(Self {
             providers,
             default_provider,
         })

As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities; document fallback behavior when fallback is intentional and safe."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 581 - 617, The constructor pub fn
new(config: &TranscriptionConfig) currently swallows provider init errors (calls
like GroqProvider::from_config, OpenAiWhisperProvider::from_config,
DeepgramProvider::from_config, AssemblyAiProvider::from_config,
GoogleSttProvider::from_config) and still returns Ok with
config.default_provider set, which hides root causes when the chosen default
failed to initialize; change the logic to surface errors: capture and propagate
the concrete error when a provider factory fails (or at minimum record failures
and, after attempting all providers, if config.default_provider.is_some() but
providers does not contain that key, return an Err/bail! with a clear message
that includes the original provider error(s)); ensure the new() function returns
an error instead of silently succeeding whenever the configured default provider
failed to initialize.

Comment on lines +696 to +727
/// Default STT provider: "groq", "openai", "deepgram", "assemblyai", "google".
#[serde(default = "default_transcription_provider")]
pub default_provider: String,
/// API key used for transcription requests (Groq provider).
///
/// If unset, runtime falls back to `GROQ_API_KEY` for backward compatibility.
#[serde(default)]
pub api_key: Option<String>,
/// Whisper API endpoint URL.
/// Whisper API endpoint URL (Groq provider).
#[serde(default = "default_transcription_api_url")]
pub api_url: String,
/// Whisper model name.
/// Whisper model name (Groq provider).
#[serde(default = "default_transcription_model")]
pub model: String,
/// Optional language hint (ISO-639-1, e.g. "en", "ru").
/// Optional language hint (ISO-639-1, e.g. "en", "ru") for Groq provider.
#[serde(default)]
pub language: Option<String>,
/// Maximum voice duration in seconds (messages longer than this are skipped).
#[serde(default = "default_transcription_max_duration_secs")]
pub max_duration_secs: u64,
/// OpenAI Whisper STT provider configuration.
#[serde(default)]
pub openai: Option<OpenAiSttConfig>,
/// Deepgram STT provider configuration.
#[serde(default)]
pub deepgram: Option<DeepgramSttConfig>,
/// AssemblyAI STT provider configuration.
#[serde(default)]
pub assemblyai: Option<AssemblyAiSttConfig>,
/// Google Cloud Speech-to-Text provider configuration.
#[serde(default)]
pub google: Option<GoogleSttConfig>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate transcription.default_provider at config load time.

Config::validate() never inspects the transcription section, so typos like "deppgram" or an empty model in the selected provider block are accepted until the first audio message hits the STT path. This should fail fast during config validation instead of surfacing as a delayed runtime error.

🧪 Suggested validation shape
match self.transcription.default_provider.trim() {
    "groq" => {
        if self.transcription.model.trim().is_empty() {
            anyhow::bail!("transcription.model must not be empty when default_provider=groq");
        }
    }
    "openai" => {
        let cfg = self
            .transcription
            .openai
            .as_ref()
            .ok_or_else(|| anyhow::anyhow!("transcription.openai is required when default_provider=openai"))?;
        if cfg.model.trim().is_empty() {
            anyhow::bail!("transcription.openai.model must not be empty");
        }
    }
    "deepgram" => { /* same pattern */ }
    "assemblyai" => {}
    "google" => {
        let cfg = self
            .transcription
            .google
            .as_ref()
            .ok_or_else(|| anyhow::anyhow!("transcription.google is required when default_provider=google"))?;
        if cfg.language_code.trim().is_empty() {
            anyhow::bail!("transcription.google.language_code must not be empty");
        }
    }
    other => anyhow::bail!(
        "transcription.default_provider must be one of: groq, openai, deepgram, assemblyai, google (got {other})"
    ),
}

As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; keep error paths obvious and localized."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 696 - 727, Add validation for the
transcription section inside Config::validate(): inspect
self.transcription.default_provider (trimmed) and match it against the allowed
values ("groq","openai","deepgram","assemblyai","google"); for each provider
require the corresponding fields are present and non-empty (e.g., for "groq"
ensure self.transcription.model is non-empty, for "openai" ensure
self.transcription.openai is Some and openai.model is non-empty, for "deepgram"
require self.transcription.deepgram Some and necessary fields non-empty, for
"google" require self.transcription.google Some and google.language_code
non-empty, assemblyai can just be allowed) and call anyhow::bail! with clear
messages on missing/empty values or on an unsupported default_provider value so
config loading fails fast.

@rikitrader
Copy link

Thank you for the contribution! We appreciate the effort.

We are closing this PR because it introduces a standalone module with no integration into the existing codebase — the new types/functions are not called by any existing code path. ZeroClaw follows a trait-driven architecture where new features must be wired through factory registration and have at least one active caller.

To reopen, please:

  1. Wire the feature into an existing subsystem (channel, provider, tool factory, or agent loop)
  2. Add integration tests that exercise the end-to-end flow
  3. Keep PR scope focused — one feature per PR, ideally under 500 lines

We welcome smaller, integrated contributions. See CLAUDE.md §7 for playbooks on adding providers, channels, tools, and peripherals.

@rikitrader
Copy link

Review: This PR introduces a standalone module that is not wired into any existing code path. The new types/functions have no callers in the codebase.

To make this mergeable:

  1. Wire into an existing subsystem (channel, provider, tool factory, or agent loop)
  2. Add integration tests exercising the end-to-end flow
  3. Keep scope under 500 lines per PR

See CLAUDE.md §7 for playbooks. Recommend closing and resubmitting as smaller, integrated PRs.

@rareba rareba force-pushed the feature/stt-multi-provider branch from 94993a5 to 742c95f Compare March 9, 2026 07:58
@rareba
Copy link
Contributor Author

rareba commented Mar 9, 2026

Rebased onto current master. The PR diff is now 3 files / 936 lines (was 937 files / 198K lines due to divergent base).

What changed:

  • Clean branch from master with only the multi-provider STT feature
  • TranscriptionProvider trait with implementations for Groq, OpenAI Whisper, Deepgram, AssemblyAI, Google
  • Factory function create_transcription_provider() with config-gated provider selection
  • Config structs for each provider with per-provider credentials
  • Already wired into Telegram and Discord voice message handlers via .with_transcription()
  • cargo check clean

rareba added 2 commits March 15, 2026 15:52
Refactors single-endpoint transcription to support multiple providers:
Groq (existing), OpenAI Whisper, Deepgram, AssemblyAI, and Google Cloud
Speech-to-Text. Adds TranscriptionManager for provider routing with
backward-compatible config fields.
@rareba rareba force-pushed the feature/stt-multi-provider branch from 5bf6523 to 184df98 Compare March 15, 2026 14:55
@rareba
Copy link
Contributor Author

rareba commented Mar 15, 2026

Superseded: reopening from feat/stt-multi-provider branch (corrected prefix per CONTRIBUTING.md).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(stt): multi-provider STT with TranscriptionProvider trait

2 participants