Skip to content

feat(transcription): add Mistral Voxtral support for voice transcription#2968

Open
WAlexandreW wants to merge 1200 commits intozeroclaw-labs:masterfrom
WAlexandreW:feat/mistral-transcription-support
Open

feat(transcription): add Mistral Voxtral support for voice transcription#2968
WAlexandreW wants to merge 1200 commits intozeroclaw-labs:masterfrom
WAlexandreW:feat/mistral-transcription-support

Conversation

@WAlexandreW
Copy link

@WAlexandreW WAlexandreW commented Mar 7, 2026

Summary

  • Base branch target (main by default; use dev only when maintainers explicitly request integration batching): main
  • Problem: The transcription subsystem was hardcoded to Groq's Whisper API, with no way to use other STT providers without code changes.
  • Why it matters: Mistral's Voxtral (voxtral-mini-latest) offers a competitive alternative transcription API with an endpoint-compatible multipart request format.
  • What changed: src/channels/transcription.rs now infers the proxy key (transcription.mistral vs transcription.groq) and env-var fallback (MISTRAL_API_KEY vs GROQ_API_KEY) dynamically from the configured api_url using proper URL host parsing. src/config/schema.rs adds documentation, a custom Debug impl that redacts api_key, and secret encryption/decryption for the transcription key.
  • What did not change: Default endpoint and model remain Groq + whisper-large-v3-turbo. No new dependencies. No breaking changes to existing config.

Label Snapshot (required)

  • Risk label: risk: low
  • Size label: size: S
  • Scope labels: channel, config, provider
  • Module labels: channel: telegram, provider: mistral
  • Contributor tier label: N/A
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type: feature
  • Primary scope: channel

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

cargo fmt --check   # pass
cargo clippy         # pass (6 pre-existing warnings, 0 from this PR)
cargo test --lib channels::transcription  # 18 passed, 0 failed

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No (same multipart/form-data flow, different endpoint configured by user)
  • Secrets/tokens handling changed? Yes — MISTRAL_API_KEY env var now checked as fallback; api_key is redacted in Debug output and encrypted/decrypted in config persistence
  • File system access scope changed? No
  • Risk/mitigation: API key lookup is URL-scoped. Key is redacted in debug output and encrypted at rest.

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: TranscriptionConfig::Debug redacts api_key
  • Neutral wording confirmation: All test names and fixtures use neutral language.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes — MISTRAL_API_KEY env var now recognized as fallback (additive only)
  • Migration needed? No

Rollback Plan (required)

  • Fast rollback: Revert the 5 commits — no DB or config migrations needed.
  • Feature flags: Users who never configure a Mistral api_url are completely unaffected.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added support for Mistral as a transcription provider and automatic provider detection from the API URL.
  • Security

    • Transcription API key is now encrypted when saved and decrypted on load.
  • Improvements

    • Refined API-key resolution order and clearer missing-key messaging; proxy selection now follows detected provider.
  • Tests & Docs

    • Added tests for provider detection, key resolution and persistence; updated docs to reflect multi-provider behavior.

theonlyhennygod and others added 30 commits March 1, 2026 23:40
…ark-image-download

fix(lark): fetch image messages via resource endpoint
…odex-oauth-docs-and-model

docs(codex): add oauth quickstart and gpt-5.3 model
…indows-link-count

fix(security): avoid unstable windows link-count API
…nuation-policy

feat(agent): add provider-agnostic max-token continuation policy
…-profile

ci: use release profile in reproducible build check
fix(gateway): require WATI webhook auth (RMN-323)
theonlyhennygod and others added 21 commits March 5, 2026 13:41
Refactor code for better readability and formatting.
Refactor print statements for better readability and clean formatting.
…uency optimization

Consolidate redundant Rust compilation jobs to cut PR cycle time from 2+ hours
to ~30 minutes by reducing parallel cold compilations and upgrading runners.

CI Run (ci-run.yml):
- Merge lint + workspace-check + package-check → quality-gate (25min, 8vcpu)
- Merge test + build → test-and-build (30min, 8vcpu)
- Unify cache keys: prefix-key=zeroclaw-ci-v1, shared-key=runner.os-rust
- Update ci-required gate, lint-feedback deps to reference new job names

Security Audit (sec-audit.yml):
- Merge audit + deny + security-regressions → rust-security (25min, 8vcpu)
- Merge sbom + unsafe-debt → compliance (lightweight runner)
- Add fast-path: non-Rust PRs skip Rust compilation entirely

Frequency optimization (off PR path):
- sec-codeql.yml: push-to-main + weekly only (was PR + push)
- ci-reproducible-build.yml: push-to-main + weekly only (was PR + push)
- ci-change-audit.yml: push-to-main only (was PR + push)

Runner upgrades:
- All Rust compilation jobs: 2vcpu → blacksmith-8vcpu-ubuntu-2404
- ci-supply-chain-provenance, test-fuzz: upgraded to 8vcpu
- test-e2e: upgraded to 8vcpu, fixed env indentation bug

Feature matrix (feature-matrix.yml):
- Non-default lanes (whatsapp-web, browser-native, nightly-all-features)
  skip on compile profile, run on nightly only
- resolve-profile + summary jobs use ubuntu-latest (no Rust compilation)

Docs/scripts:
- lint_feedback.js: update job name references for quality-gate
- required-check-mapping.md: document new consolidated job names
- ci-map.md: update trigger map, triage guide, maintenance rules
- self-hosted-runner-remediation.md: update job name reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… rust-docs on 1.92.0

Added environment variable to skip rust-docs in E2E tests.
- Replace fragile contains("mistral.ai") with proper URL host parsing
  via is_mistral_host() using reqwest::Url
- Add api_key field to TranscriptionConfig for explicit key configuration
- Enrich TranscriptionConfig docs with defaults, compatibility, migration
- Add 8 new unit tests: Mistral/Groq key resolution, whitespace
  filtering, URL host detection, and spoofed-path rejection
Add decrypt_optional_secret and encrypt_optional_secret calls for
config.transcription.api_key in Config::load_or_init and Config::save,
matching the pattern used by other sensitive credential fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

The PR adds multi-provider transcription support (Mistral vs Groq) by detecting provider from API URL, updates transcribe_audio to accept &TranscriptionConfig, implements provider-aware API key resolution and proxy selection, and encrypts/decrypts transcription.api_key during config load/save. Tests and docs updated accordingly.

Changes

Cohort / File(s) Summary
Transcription Service
src/channels/transcription.rs
Added fn is_mistral_host(api_url: &str) -> bool; changed pub async fn transcribe_audio(..., config: &TranscriptionConfig) signature; implemented API key resolution priority (config.api_key → MISTRAL_API_KEY if Mistral host else GROQ_API_KEY → error); provider-aware proxy selection (transcription.mistral vs transcription.groq); updated MIME/filename handling and added tests for host detection, key resolution, and whitespace handling.
Configuration Schema & Persistence
src/config/schema.rs
Removed auto-derived Debug for TranscriptionConfig and added custom impl Debug that redacts api_key; added "transcription.mistral" to SUPPORTED_PROXY_SERVICE_KEYS; implemented encryption on save and decryption on load for transcription.api_key; updated docs/comments and tests to validate encrypted storage and roundtrip behavior.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Caller
participant Transcription as Transcription::transcribe_audio
participant Config as TranscriptionConfig
participant Proxy as ProxyClient
participant External as ExternalAPI(Mistral/Groq)

Caller->>Transcription: call transcribe_audio(audio, filename, &config)
Transcription->>Config: read api_key & api_url
alt config.api_key present
    Transcription->>Transcription: use config.api_key
else
    Transcription->>Transcription: is_mistral_host(api_url)?
    alt Mistral host
        Transcription->>Proxy: select "transcription.mistral"
        Transcription->>Config: prefer MISTRAL_API_KEY from env
    else Groq host
        Transcription->>Proxy: select "transcription.groq"
        Transcription->>Config: prefer GROQ_API_KEY from env
    end
end
Transcription->>Proxy: send multipart request with resolved key
Proxy->>External: forward request to endpoint
External-->>Proxy: transcription result
Proxy-->>Transcription: response
Transcription-->>Caller: return transcript

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

size: M, risk: medium, config: core, channel: transcription

Suggested reviewers

  • theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature: adding Mistral Voxtral support for voice transcription, which is the primary change reflected in the code modifications.
Description check ✅ Passed The PR description comprehensively covers all required template sections including summary, labels, change metadata, linked issues, validation evidence, security impact, privacy/data hygiene, compatibility, and rollback plan.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/channels/transcription.rs (1)

173-175: ⚠️ Potential issue | 🟠 Major

Lock environment mutations in tests and stub network calls.

These tests remove GROQ_API_KEY and MISTRAL_API_KEY without restoration or synchronization, then drive real HTTP POST requests to api.groq.com and api.mistral.ai. The uses_config_api_key_without_groq_env test explicitly expects the HTTP request to fail (line 196–197), making assertions depend on test ordering, ambient environment state, and outbound network availability rather than purely testing key-resolution logic.

Extract the API key resolution logic into a pure function, or at minimum wrap env mutations with a process-level lock (see src/tools/pushover.rs / src/providers/openai_codex.rs for established LazyLock/OnceLock + EnvGuard patterns) and stub the HTTP layer for deterministic assertions. Per coding guidelines: "Prefer reproducible commands and locked dependency behavior in CI-sensitive paths; keep tests deterministic (no flaky timing/network dependence without guardrails)."

Also applies to: 190–200, 209–210, 228–229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 173 - 175, The tests in
src/channels/transcription.rs mutate process env (removing GROQ_API_KEY /
MISTRAL_API_KEY) and perform real HTTP POSTs, causing order- and
network-dependent failures; refactor by extracting the API key resolution into a
pure function (e.g., new function resolve_api_key or similar used by
uses_config_api_key_without_groq_env) and change tests to call that pure
function directly, and/or protect env mutations with a process-level
lock/EnvGuard pattern (see LazyLock/OnceLock + EnvGuard used in pushover.rs and
openai_codex.rs) so env vars are restored and tests run serially; additionally,
replace real HTTP calls in these tests with a stubbed HTTP layer or mock client
(injectable via the same call sites that use the network) so the tests assert
key-resolution deterministically without outbound network access.
🧹 Nitpick comments (1)
src/config/schema.rs (1)

7818-7822: Drop the duplicated transcription secret pass.

config.transcription.api_key is already decrypted/encrypted earlier in each method, so these second calls are dead work and make the secret wiring easier to drift later.

♻️ Proposed cleanup
-            decrypt_optional_secret(
-                &store,
-                &mut config.transcription.api_key,
-                "config.transcription.api_key",
-            )?;
-        encrypt_optional_secret(
-            &store,
-            &mut config_to_save.transcription.api_key,
-            "config.transcription.api_key",
-        )?;

Also applies to: 9688-9692

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 7818 - 7822, The call to
decrypt_optional_secret on config.transcription.api_key is duplicated (see
decrypt_optional_secret(&store, &mut config.transcription.api_key,
"config.transcription.api_key")), causing unnecessary work and risk of drift;
remove the second call(s) so that config.transcription.api_key is only
decrypted/encrypted once per method (delete the redundant
decrypt_optional_secret invocations around config.transcription.api_key and keep
the original single call where the secret is first handled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/transcription.rs`:
- Around line 35-46: The is_mistral_host function should not silently treat a
parse error as false; change it to parse api_url once up front and return a
Result<bool, E> (or use bail!/anyhow error) on Url::parse(api_url) failures so
callers can fail-fast on malformed URLs; keep the same host check logic (host ==
"mistral.ai" || host.ends_with(".mistral.ai")) after parsing, and update callers
of is_mistral_host to handle the Result (propagate the error or convert to a
clear invalid-URL error) rather than relying on a fallback false.

In `@src/config/schema.rs`:
- Around line 733-751: The Debug implementation for TranscriptionConfig
currently prints api_url (in impl std::fmt::Debug for TranscriptionConfig),
which can leak credentials; update the Debug::fmt to redact api_url the same way
api_key is handled (e.g., show Some("<redacted>") when api_url.is_some() and
None otherwise) so that the "api_url" field no longer exposes raw URLs or
embedded credentials in logs.

---

Outside diff comments:
In `@src/channels/transcription.rs`:
- Around line 173-175: The tests in src/channels/transcription.rs mutate process
env (removing GROQ_API_KEY / MISTRAL_API_KEY) and perform real HTTP POSTs,
causing order- and network-dependent failures; refactor by extracting the API
key resolution into a pure function (e.g., new function resolve_api_key or
similar used by uses_config_api_key_without_groq_env) and change tests to call
that pure function directly, and/or protect env mutations with a process-level
lock/EnvGuard pattern (see LazyLock/OnceLock + EnvGuard used in pushover.rs and
openai_codex.rs) so env vars are restored and tests run serially; additionally,
replace real HTTP calls in these tests with a stubbed HTTP layer or mock client
(injectable via the same call sites that use the network) so the tests assert
key-resolution deterministically without outbound network access.

---

Nitpick comments:
In `@src/config/schema.rs`:
- Around line 7818-7822: The call to decrypt_optional_secret on
config.transcription.api_key is duplicated (see decrypt_optional_secret(&store,
&mut config.transcription.api_key, "config.transcription.api_key")), causing
unnecessary work and risk of drift; remove the second call(s) so that
config.transcription.api_key is only decrypted/encrypted once per method (delete
the redundant decrypt_optional_secret invocations around
config.transcription.api_key and keep the original single call where the secret
is first handled).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 63864030-a049-4af7-a033-90acd542b449

📥 Commits

Reviewing files that changed from the base of the PR and between 326b60d and 7af7eb0.

📒 Files selected for processing (2)
  • src/channels/transcription.rs
  • src/config/schema.rs

Comment on lines +35 to +46
/// Returns `true` when `api_url` points to a Mistral endpoint.
///
/// Parses the URL and inspects the host (case-insensitive). Falls back to
/// `false` on parse errors so the Groq default path is used.
fn is_mistral_host(api_url: &str) -> bool {
Url::parse(api_url)
.ok()
.and_then(|u| u.host_str().map(|h| h.to_ascii_lowercase()))
.map_or(false, |host| {
host == "mistral.ai" || host.ends_with(".mistral.ai")
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast on malformed api_url.

Returning false on parse errors silently classifies a bad Mistral URL as Groq, which then selects the wrong env var and proxy key and can surface a misleading missing-key error. Parse once up front and return an explicit invalid-URL error instead.

As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities; document fallback behavior when fallback is intentional and safe".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 35 - 46, The is_mistral_host
function should not silently treat a parse error as false; change it to parse
api_url once up front and return a Result<bool, E> (or use bail!/anyhow error)
on Url::parse(api_url) failures so callers can fail-fast on malformed URLs; keep
the same host check logic (host == "mistral.ai" ||
host.ends_with(".mistral.ai")) after parsing, and update callers of
is_mistral_host to handle the Result (propagate the error or convert to a clear
invalid-URL error) rather than relying on a fallback false.

This addresses PR feedback: Deny-by-default for access and exposure boundaries; never log secrets, raw tokens, or sensitive payloads. To prevent leaking credentials via query parameters in custom URLs.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/config/schema.rs (1)

7818-7822: Remove the second transcription.api_key pass in load/save.

transcription.api_key is already decrypted at Lines 7760-7764 and encrypted at Lines 9630-9634. These extra calls are no-ops today, but they make the secret-handling path harder to audit and easier to drift later.

✂️ Proposed cleanup
-            decrypt_optional_secret(
-                &store,
-                &mut config.transcription.api_key,
-                "config.transcription.api_key",
-            )?;
@@
-        encrypt_optional_secret(
-            &store,
-            &mut config_to_save.transcription.api_key,
-            "config.transcription.api_key",
-        )?;

As per coding guidelines, "Duplicate small, local logic when it preserves clarity; extract shared utilities only after repeated, stable patterns (rule-of-three)."

Also applies to: 9688-9692

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 7818 - 7822, Remove the duplicate
secret-handling calls for config.transcription.api_key: delete the extra
decrypt_optional_secret(&store, &mut config.transcription.api_key,
"config.transcription.api_key") in the load path and the duplicate
encrypt_optional_secret/related call in the save path so the key is only
decrypted/encrypted at the original single locations (the existing calls near
where transcription.api_key is first handled). Ensure you leave the original
decrypt_optional_secret and encrypt_optional_secret calls intact and run tests
to confirm no behavior change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/schema.rs`:
- Around line 699-700: The code currently treats any non-Mistral host (including
parse errors and unrecognized endpoints) as eligible to use the GROQ fallback
API key via is_mistral_host()-based logic; change the selection so that you only
use the GROQ fallback when the parsed URL explicitly matches a known Groq host
and only use the Mistral fallback when is_mistral_host() returns true for a
successfully parsed URL. Concretely: validate the api_url parse result before
checking host, treat parse failures as "unknown host" and require an explicit
api_key (return an error/bail), replace the permissive else-branch that hands
Groq credentials with a strict host whitelist check (explicit Groq host
matcher), and ensure the credential resolution path (the function that chooses
api_key for transcription) errors when the host is neither a recognized Mistral
nor Groq host instead of silently falling back.

---

Nitpick comments:
In `@src/config/schema.rs`:
- Around line 7818-7822: Remove the duplicate secret-handling calls for
config.transcription.api_key: delete the extra decrypt_optional_secret(&store,
&mut config.transcription.api_key, "config.transcription.api_key") in the load
path and the duplicate encrypt_optional_secret/related call in the save path so
the key is only decrypted/encrypted at the original single locations (the
existing calls near where transcription.api_key is first handled). Ensure you
leave the original decrypt_optional_secret and encrypt_optional_secret calls
intact and run tests to confirm no behavior change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d572fb0-3a82-4cb4-9ec6-52c9fbefb097

📥 Commits

Reviewing files that changed from the base of the PR and between 7af7eb0 and c628b59.

📒 Files selected for processing (1)
  • src/config/schema.rs

Comment on lines +699 to +700
/// If unset, runtime falls back to `MISTRAL_API_KEY` (for Mistral
/// endpoints) or `GROQ_API_KEY` (all others).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n -C4 'MISTRAL_API_KEY|GROQ_API_KEY|transcription\.mistral|transcription\.groq|is_mistral_host' \
  src/channels/transcription.rs src/config/schema.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 11327


Don't fall back to GROQ_API_KEY for arbitrary non-Mistral URLs.

The runtime implementation (lines 39–104 in src/channels/transcription.rs) falls back to GROQ_API_KEY for any URL that is not a Mistral host, including parse errors and custom endpoints. For example, a typo in api_url (e.g., htp://...) parses as invalid → is_mistral_host() returns false → Groq credential is used regardless. This silently broadens capabilities and leaks Groq credentials to untrusted or misconfigured endpoints.

Restrict the fallback to recognized Mistral and Groq hosts only; for unknown hosts, require an explicit api_key in the config or fail fast. This aligns with the principle "prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 699 - 700, The code currently treats any
non-Mistral host (including parse errors and unrecognized endpoints) as eligible
to use the GROQ fallback API key via is_mistral_host()-based logic; change the
selection so that you only use the GROQ fallback when the parsed URL explicitly
matches a known Groq host and only use the Mistral fallback when
is_mistral_host() returns true for a successfully parsed URL. Concretely:
validate the api_url parse result before checking host, treat parse failures as
"unknown host" and require an explicit api_key (return an error/bail), replace
the permissive else-branch that hands Groq credentials with a strict host
whitelist check (explicit Groq host matcher), and ensure the credential
resolution path (the function that chooses api_key for transcription) errors
when the host is neither a recognized Mistral nor Groq host instead of silently
falling back.

@theonlyhennygod theonlyhennygod changed the base branch from main to master March 12, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants