feat(transcription): add Mistral Voxtral support for voice transcription#2968
feat(transcription): add Mistral Voxtral support for voice transcription#2968WAlexandreW wants to merge 1200 commits intozeroclaw-labs:masterfrom
Conversation
…ark-image-download fix(lark): fetch image messages via resource endpoint
…odex-oauth-docs-and-model docs(codex): add oauth quickstart and gpt-5.3 model
…indows-link-count fix(security): avoid unstable windows link-count API
…nuation-policy feat(agent): add provider-agnostic max-token continuation policy
…-profile ci: use release profile in reproducible build check
fix(gateway): require WATI webhook auth (RMN-323)
…n memory_comparison.rsnsistency
Refactor code for better readability and formatting.
Refactor print statements for better readability and clean formatting.
…uency optimization Consolidate redundant Rust compilation jobs to cut PR cycle time from 2+ hours to ~30 minutes by reducing parallel cold compilations and upgrading runners. CI Run (ci-run.yml): - Merge lint + workspace-check + package-check → quality-gate (25min, 8vcpu) - Merge test + build → test-and-build (30min, 8vcpu) - Unify cache keys: prefix-key=zeroclaw-ci-v1, shared-key=runner.os-rust - Update ci-required gate, lint-feedback deps to reference new job names Security Audit (sec-audit.yml): - Merge audit + deny + security-regressions → rust-security (25min, 8vcpu) - Merge sbom + unsafe-debt → compliance (lightweight runner) - Add fast-path: non-Rust PRs skip Rust compilation entirely Frequency optimization (off PR path): - sec-codeql.yml: push-to-main + weekly only (was PR + push) - ci-reproducible-build.yml: push-to-main + weekly only (was PR + push) - ci-change-audit.yml: push-to-main only (was PR + push) Runner upgrades: - All Rust compilation jobs: 2vcpu → blacksmith-8vcpu-ubuntu-2404 - ci-supply-chain-provenance, test-fuzz: upgraded to 8vcpu - test-e2e: upgraded to 8vcpu, fixed env indentation bug Feature matrix (feature-matrix.yml): - Non-default lanes (whatsapp-web, browser-native, nightly-all-features) skip on compile profile, run on nightly only - resolve-profile + summary jobs use ubuntu-latest (no Rust compilation) Docs/scripts: - lint_feedback.js: update job name references for quality-gate - required-check-mapping.md: document new consolidated job names - ci-map.md: update trigger map, triage guide, maintenance rules - self-hosted-runner-remediation.md: update job name reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… rust-docs on 1.92.0 Added environment variable to skip rust-docs in E2E tests.
…install on 1.92.0
- Replace fragile contains("mistral.ai") with proper URL host parsing
via is_mistral_host() using reqwest::Url
- Add api_key field to TranscriptionConfig for explicit key configuration
- Enrich TranscriptionConfig docs with defaults, compatibility, migration
- Add 8 new unit tests: Mistral/Groq key resolution, whitespace
filtering, URL host detection, and spoofed-path rejection
Add decrypt_optional_secret and encrypt_optional_secret calls for config.transcription.api_key in Config::load_or_init and Config::save, matching the pattern used by other sensitive credential fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note
|
| Cohort / File(s) | Summary |
|---|---|
Transcription Service src/channels/transcription.rs |
Added fn is_mistral_host(api_url: &str) -> bool; changed pub async fn transcribe_audio(..., config: &TranscriptionConfig) signature; implemented API key resolution priority (config.api_key → MISTRAL_API_KEY if Mistral host else GROQ_API_KEY → error); provider-aware proxy selection (transcription.mistral vs transcription.groq); updated MIME/filename handling and added tests for host detection, key resolution, and whitespace handling. |
Configuration Schema & Persistence src/config/schema.rs |
Removed auto-derived Debug for TranscriptionConfig and added custom impl Debug that redacts api_key; added "transcription.mistral" to SUPPORTED_PROXY_SERVICE_KEYS; implemented encryption on save and decryption on load for transcription.api_key; updated docs/comments and tests to validate encrypted storage and roundtrip behavior. |
Sequence Diagram(s)
mermaid
sequenceDiagram
participant Caller
participant Transcription as Transcription::transcribe_audio
participant Config as TranscriptionConfig
participant Proxy as ProxyClient
participant External as ExternalAPI(Mistral/Groq)
Caller->>Transcription: call transcribe_audio(audio, filename, &config)
Transcription->>Config: read api_key & api_url
alt config.api_key present
Transcription->>Transcription: use config.api_key
else
Transcription->>Transcription: is_mistral_host(api_url)?
alt Mistral host
Transcription->>Proxy: select "transcription.mistral"
Transcription->>Config: prefer MISTRAL_API_KEY from env
else Groq host
Transcription->>Proxy: select "transcription.groq"
Transcription->>Config: prefer GROQ_API_KEY from env
end
end
Transcription->>Proxy: send multipart request with resolved key
Proxy->>External: forward request to endpoint
External-->>Proxy: transcription result
Proxy-->>Transcription: response
Transcription-->>Caller: return transcript
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
- feat: support config-level api_key for transcription #2112: Modifies transcription config and transcribe_audio usage; overlaps in adding config-level API key handling and persistence.
- feat(whatsapp-web): supersede #1992 transcription flow [RMN-205] #2192: Touches transcription integration points; related changes to transcribe_audio signature affect callers in that PR.
- fix(discord): transcribe inbound audio attachments #2700: Updates transcription API usage in integrations (Discord); closely tied to TranscriptionConfig and transcribe_audio changes.
Suggested labels
size: M, risk: medium, config: core, channel: transcription
Suggested reviewers
- theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately describes the main feature: adding Mistral Voxtral support for voice transcription, which is the primary change reflected in the code modifications. |
| Description check | ✅ Passed | The PR description comprehensively covers all required template sections including summary, labels, change metadata, linked issues, validation evidence, security impact, privacy/data hygiene, compatibility, and rollback plan. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Post copyable unit tests in a comment
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/channels/transcription.rs (1)
173-175:⚠️ Potential issue | 🟠 MajorLock environment mutations in tests and stub network calls.
These tests remove
GROQ_API_KEYandMISTRAL_API_KEYwithout restoration or synchronization, then drive real HTTP POST requests toapi.groq.comandapi.mistral.ai. Theuses_config_api_key_without_groq_envtest explicitly expects the HTTP request to fail (line 196–197), making assertions depend on test ordering, ambient environment state, and outbound network availability rather than purely testing key-resolution logic.Extract the API key resolution logic into a pure function, or at minimum wrap env mutations with a process-level lock (see
src/tools/pushover.rs/src/providers/openai_codex.rsfor establishedLazyLock/OnceLock+EnvGuardpatterns) and stub the HTTP layer for deterministic assertions. Per coding guidelines: "Prefer reproducible commands and locked dependency behavior in CI-sensitive paths; keep tests deterministic (no flaky timing/network dependence without guardrails)."Also applies to: 190–200, 209–210, 228–229
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/transcription.rs` around lines 173 - 175, The tests in src/channels/transcription.rs mutate process env (removing GROQ_API_KEY / MISTRAL_API_KEY) and perform real HTTP POSTs, causing order- and network-dependent failures; refactor by extracting the API key resolution into a pure function (e.g., new function resolve_api_key or similar used by uses_config_api_key_without_groq_env) and change tests to call that pure function directly, and/or protect env mutations with a process-level lock/EnvGuard pattern (see LazyLock/OnceLock + EnvGuard used in pushover.rs and openai_codex.rs) so env vars are restored and tests run serially; additionally, replace real HTTP calls in these tests with a stubbed HTTP layer or mock client (injectable via the same call sites that use the network) so the tests assert key-resolution deterministically without outbound network access.
🧹 Nitpick comments (1)
src/config/schema.rs (1)
7818-7822: Drop the duplicated transcription secret pass.
config.transcription.api_keyis already decrypted/encrypted earlier in each method, so these second calls are dead work and make the secret wiring easier to drift later.♻️ Proposed cleanup
- decrypt_optional_secret( - &store, - &mut config.transcription.api_key, - "config.transcription.api_key", - )?;- encrypt_optional_secret( - &store, - &mut config_to_save.transcription.api_key, - "config.transcription.api_key", - )?;Also applies to: 9688-9692
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 7818 - 7822, The call to decrypt_optional_secret on config.transcription.api_key is duplicated (see decrypt_optional_secret(&store, &mut config.transcription.api_key, "config.transcription.api_key")), causing unnecessary work and risk of drift; remove the second call(s) so that config.transcription.api_key is only decrypted/encrypted once per method (delete the redundant decrypt_optional_secret invocations around config.transcription.api_key and keep the original single call where the secret is first handled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/channels/transcription.rs`:
- Around line 35-46: The is_mistral_host function should not silently treat a
parse error as false; change it to parse api_url once up front and return a
Result<bool, E> (or use bail!/anyhow error) on Url::parse(api_url) failures so
callers can fail-fast on malformed URLs; keep the same host check logic (host ==
"mistral.ai" || host.ends_with(".mistral.ai")) after parsing, and update callers
of is_mistral_host to handle the Result (propagate the error or convert to a
clear invalid-URL error) rather than relying on a fallback false.
In `@src/config/schema.rs`:
- Around line 733-751: The Debug implementation for TranscriptionConfig
currently prints api_url (in impl std::fmt::Debug for TranscriptionConfig),
which can leak credentials; update the Debug::fmt to redact api_url the same way
api_key is handled (e.g., show Some("<redacted>") when api_url.is_some() and
None otherwise) so that the "api_url" field no longer exposes raw URLs or
embedded credentials in logs.
---
Outside diff comments:
In `@src/channels/transcription.rs`:
- Around line 173-175: The tests in src/channels/transcription.rs mutate process
env (removing GROQ_API_KEY / MISTRAL_API_KEY) and perform real HTTP POSTs,
causing order- and network-dependent failures; refactor by extracting the API
key resolution into a pure function (e.g., new function resolve_api_key or
similar used by uses_config_api_key_without_groq_env) and change tests to call
that pure function directly, and/or protect env mutations with a process-level
lock/EnvGuard pattern (see LazyLock/OnceLock + EnvGuard used in pushover.rs and
openai_codex.rs) so env vars are restored and tests run serially; additionally,
replace real HTTP calls in these tests with a stubbed HTTP layer or mock client
(injectable via the same call sites that use the network) so the tests assert
key-resolution deterministically without outbound network access.
---
Nitpick comments:
In `@src/config/schema.rs`:
- Around line 7818-7822: The call to decrypt_optional_secret on
config.transcription.api_key is duplicated (see decrypt_optional_secret(&store,
&mut config.transcription.api_key, "config.transcription.api_key")), causing
unnecessary work and risk of drift; remove the second call(s) so that
config.transcription.api_key is only decrypted/encrypted once per method (delete
the redundant decrypt_optional_secret invocations around
config.transcription.api_key and keep the original single call where the secret
is first handled).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 63864030-a049-4af7-a033-90acd542b449
📒 Files selected for processing (2)
src/channels/transcription.rssrc/config/schema.rs
| /// Returns `true` when `api_url` points to a Mistral endpoint. | ||
| /// | ||
| /// Parses the URL and inspects the host (case-insensitive). Falls back to | ||
| /// `false` on parse errors so the Groq default path is used. | ||
| fn is_mistral_host(api_url: &str) -> bool { | ||
| Url::parse(api_url) | ||
| .ok() | ||
| .and_then(|u| u.host_str().map(|h| h.to_ascii_lowercase())) | ||
| .map_or(false, |host| { | ||
| host == "mistral.ai" || host.ends_with(".mistral.ai") | ||
| }) | ||
| } |
There was a problem hiding this comment.
Fail fast on malformed api_url.
Returning false on parse errors silently classifies a bad Mistral URL as Groq, which then selects the wrong env var and proxy key and can surface a misleading missing-key error. Parse once up front and return an explicit invalid-URL error instead.
As per coding guidelines, "Prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities; document fallback behavior when fallback is intentional and safe".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/transcription.rs` around lines 35 - 46, The is_mistral_host
function should not silently treat a parse error as false; change it to parse
api_url once up front and return a Result<bool, E> (or use bail!/anyhow error)
on Url::parse(api_url) failures so callers can fail-fast on malformed URLs; keep
the same host check logic (host == "mistral.ai" ||
host.ends_with(".mistral.ai")) after parsing, and update callers of
is_mistral_host to handle the Result (propagate the error or convert to a clear
invalid-URL error) rather than relying on a fallback false.
This addresses PR feedback: Deny-by-default for access and exposure boundaries; never log secrets, raw tokens, or sensitive payloads. To prevent leaking credentials via query parameters in custom URLs.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/config/schema.rs (1)
7818-7822: Remove the secondtranscription.api_keypass in load/save.
transcription.api_keyis already decrypted at Lines 7760-7764 and encrypted at Lines 9630-9634. These extra calls are no-ops today, but they make the secret-handling path harder to audit and easier to drift later.✂️ Proposed cleanup
- decrypt_optional_secret( - &store, - &mut config.transcription.api_key, - "config.transcription.api_key", - )?; @@ - encrypt_optional_secret( - &store, - &mut config_to_save.transcription.api_key, - "config.transcription.api_key", - )?;As per coding guidelines, "Duplicate small, local logic when it preserves clarity; extract shared utilities only after repeated, stable patterns (rule-of-three)."
Also applies to: 9688-9692
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 7818 - 7822, Remove the duplicate secret-handling calls for config.transcription.api_key: delete the extra decrypt_optional_secret(&store, &mut config.transcription.api_key, "config.transcription.api_key") in the load path and the duplicate encrypt_optional_secret/related call in the save path so the key is only decrypted/encrypted at the original single locations (the existing calls near where transcription.api_key is first handled). Ensure you leave the original decrypt_optional_secret and encrypt_optional_secret calls intact and run tests to confirm no behavior change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/config/schema.rs`:
- Around line 699-700: The code currently treats any non-Mistral host (including
parse errors and unrecognized endpoints) as eligible to use the GROQ fallback
API key via is_mistral_host()-based logic; change the selection so that you only
use the GROQ fallback when the parsed URL explicitly matches a known Groq host
and only use the Mistral fallback when is_mistral_host() returns true for a
successfully parsed URL. Concretely: validate the api_url parse result before
checking host, treat parse failures as "unknown host" and require an explicit
api_key (return an error/bail), replace the permissive else-branch that hands
Groq credentials with a strict host whitelist check (explicit Groq host
matcher), and ensure the credential resolution path (the function that chooses
api_key for transcription) errors when the host is neither a recognized Mistral
nor Groq host instead of silently falling back.
---
Nitpick comments:
In `@src/config/schema.rs`:
- Around line 7818-7822: Remove the duplicate secret-handling calls for
config.transcription.api_key: delete the extra decrypt_optional_secret(&store,
&mut config.transcription.api_key, "config.transcription.api_key") in the load
path and the duplicate encrypt_optional_secret/related call in the save path so
the key is only decrypted/encrypted at the original single locations (the
existing calls near where transcription.api_key is first handled). Ensure you
leave the original decrypt_optional_secret and encrypt_optional_secret calls
intact and run tests to confirm no behavior change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0d572fb0-3a82-4cb4-9ec6-52c9fbefb097
📒 Files selected for processing (1)
src/config/schema.rs
| /// If unset, runtime falls back to `MISTRAL_API_KEY` (for Mistral | ||
| /// endpoints) or `GROQ_API_KEY` (all others). |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C4 'MISTRAL_API_KEY|GROQ_API_KEY|transcription\.mistral|transcription\.groq|is_mistral_host' \
src/channels/transcription.rs src/config/schema.rsRepository: zeroclaw-labs/zeroclaw
Length of output: 11327
Don't fall back to GROQ_API_KEY for arbitrary non-Mistral URLs.
The runtime implementation (lines 39–104 in src/channels/transcription.rs) falls back to GROQ_API_KEY for any URL that is not a Mistral host, including parse errors and custom endpoints. For example, a typo in api_url (e.g., htp://...) parses as invalid → is_mistral_host() returns false → Groq credential is used regardless. This silently broadens capabilities and leaks Groq credentials to untrusted or misconfigured endpoints.
Restrict the fallback to recognized Mistral and Groq hosts only; for unknown hosts, require an explicit api_key in the config or fail fast. This aligns with the principle "prefer explicit bail!/errors for unsupported or unsafe states; never silently broaden permissions/capabilities."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/schema.rs` around lines 699 - 700, The code currently treats any
non-Mistral host (including parse errors and unrecognized endpoints) as eligible
to use the GROQ fallback API key via is_mistral_host()-based logic; change the
selection so that you only use the GROQ fallback when the parsed URL explicitly
matches a known Groq host and only use the Mistral fallback when
is_mistral_host() returns true for a successfully parsed URL. Concretely:
validate the api_url parse result before checking host, treat parse failures as
"unknown host" and require an explicit api_key (return an error/bail), replace
the permissive else-branch that hands Groq credentials with a strict host
whitelist check (explicit Groq host matcher), and ensure the credential
resolution path (the function that chooses api_key for transcription) errors
when the host is neither a recognized Mistral nor Groq host instead of silently
falling back.
Summary
mainby default; usedevonly when maintainers explicitly request integration batching):mainvoxtral-mini-latest) offers a competitive alternative transcription API with an endpoint-compatible multipart request format.src/channels/transcription.rsnow infers the proxy key (transcription.mistralvstranscription.groq) and env-var fallback (MISTRAL_API_KEYvsGROQ_API_KEY) dynamically from the configuredapi_urlusing proper URL host parsing.src/config/schema.rsadds documentation, a customDebugimpl that redactsapi_key, and secret encryption/decryption for the transcription key.whisper-large-v3-turbo. No new dependencies. No breaking changes to existing config.Label Snapshot (required)
risk: lowsize: Schannel, config, providerchannel: telegram,provider: mistralChange Metadata
featurechannelLinked Issue
dev, now rebased onmain)Supersede Attribution (required when
Supersedes #is used)Co-authored-bytrailers added for materially incorporated contributors? N/A (same author)Validation Evidence (required)
Security Impact (required)
MISTRAL_API_KEYenv var now checked as fallback;api_keyis redacted in Debug output and encrypted/decrypted in config persistencePrivacy and Data Hygiene (required)
passTranscriptionConfig::Debugredactsapi_keyCompatibility / Migration
MISTRAL_API_KEYenv var now recognized as fallback (additive only)Rollback Plan (required)
api_urlare completely unaffected.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Security
Improvements
Tests & Docs