Skip to content

feat(channel): Add Mistral Voxtral support for voice transcription#2778

Closed
WAlexandreW wants to merge 4 commits intozeroclaw-labs:devfrom
WAlexandreW:feat/mistral-transcription-support-v2
Closed

feat(channel): Add Mistral Voxtral support for voice transcription#2778
WAlexandreW wants to merge 4 commits intozeroclaw-labs:devfrom
WAlexandreW:feat/mistral-transcription-support-v2

Conversation

@WAlexandreW
Copy link

@WAlexandreW WAlexandreW commented Mar 4, 2026

Summary

  • Base branch target: dev
  • Problem: The transcription subsystem was hardcoded to Groq's Whisper API, with no way to use other STT providers without code changes.
  • Why it matters: Mistral's voxtral-mini-2507 (via voxtral-mini-latest) offers a competitive, zero-friction alternative transcription API with an endpoint-compatible multipart request format.
  • What changed: src/channels/transcription.rs now infers the proxy key (transcription.mistral vs transcription.groq) and env-var fallback (MISTRAL_API_KEY vs GROQ_API_KEY) dynamically from the configured api_url. src/config/schema.rs adds transcription.mistral to the supported proxy service keys and updates documentation comments.
  • What did not change: Default endpoint and model remain Groq + whisper-large-v3-turbo. No new dependencies. No breaking changes to existing config.

Label Snapshot (required)

  • Risk label: risk: low
  • Size label: size: XS
  • Scope labels: channel, config
  • Module labels: channel: telegram
  • Contributor tier label: N/A (new contributor)
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type: feature
  • Primary scope: channel

Linked Issue

  • Closes #
  • Related #
  • Linear issue key(s): N/A

Supersede Attribution (required when Supersedes # is used)

N/A

Validation Evidence (required)

cargo test --lib channels::transcription
# Result: 10 passed, 0 failed
  • Evidence provided: All 10 existing transcription unit tests passed without modification.
  • If any command is intentionally skipped: N/A

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No (same multipart/form-data transcription flow, different endpoint configured by user)
  • Secrets/tokens handling changed? Yes — MISTRAL_API_KEY env var is now checked as a fallback when the configured api_url points to mistral.ai
  • File system access scope changed? No
  • Risk/mitigation: The API key lookup is URL-scoped and remains optional. No key is ever logged or leaked.

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: No personal data, credentials, or identifiers introduced.
  • Neutral wording confirmation: All test names and fixtures use neutral language.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes — MISTRAL_API_KEY env var now recognized as a fallback (additive only)
  • Migration needed? No
  • Users who want Mistral simply set:
    [transcription]
    api_url = "https://api.mistral.ai/v1/audio/transcriptions"
    model = "voxtral-mini-2507"
    api_key = "..."  # or set MISTRAL_API_KEY

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No (no user-facing doc changes)

Human Verification (required)

  • Verified scenarios: Groq default config continues to work; Mistral config routes correctly via URL detection.
  • Edge cases checked: api_key explicitly set in config takes priority over all env vars regardless of endpoint.
  • What was not verified: Live end-to-end Mistral API call (requires a MISTRAL_API_KEY).

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: channels/transcription.rs and config/schema.rs only.
  • Potential unintended effects: None — Groq path unchanged.
  • Guardrails/monitoring for early detection: Existing transcription error logging will surface any misconfigured endpoint.

Agent Collaboration Notes (recommended)

  • Agent tools used: Context7 used to retrieve Mistral audio transcription API documentation.
  • Workflow/plan summary: Verified API compatibility, implemented URL-scoped logic, ran unit tests.
  • Verification focus: Backward compatibility and env-var scoping correctness.
  • Confirmation: naming + architecture boundaries followed per AGENTS.md + CONTRIBUTING.md.

Rollback Plan (required)

  • Fast rollback: Revert src/channels/transcription.rs and src/config/schema.rs — no DB or config migrations needed.
  • Feature flags: Users who never configure a Mistral api_url are completely unaffected.
  • Observable failure symptoms: Transcription requests return 401/403 if MISTRAL_API_KEY is unset with a Mistral endpoint configured (clear error message in logs).

Risks and Mitigations

  • Risk: User sets a Mistral endpoint but provides a Groq key by mistake.
    • Mitigation: The API will return a clear 401. The error message in logs now mentions both key options.

Summary by CodeRabbit

  • New Features
    • Per-configuration transcription API key support, automatic provider selection, and dynamic proxy routing.
  • Bug Fixes
    • More robust multi-source API key resolution with clearer error messaging and provider-aware client selection.
  • Documentation
    • Expanded transcription settings, defaults, migration notes; API key is documented and redacted in debug output; persistence now encrypts/decrypts the key.
  • Tests
    • Added coverage for key resolution, host detection, MIME/filename handling, persistence (encrypt/decrypt), and rejection scenarios.

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Thanks for contributing to ZeroClaw.

For faster review, please ensure:

  • PR template sections are fully completed
  • cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test are included
  • If automation/agents were used heavily, add brief workflow notes
  • Scope is focused (prefer one concern per PR)

See CONTRIBUTING.md and docs/pr-workflow.md for full collaboration rules.

@github-actions github-actions bot added channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. labels Mar 4, 2026
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

PR intake checks found warnings (non-blocking)

Fast safe checks found advisory issues. CI lint/test/build gates still enforce merge quality.

  • Incomplete required PR template fields: validation commands, security risk/mitigation, privacy status, rollback plan
  • Missing Linear issue key reference (RMN-<id>, CDV-<id>, or COM-<id>) in PR title/body (recommended for traceability, non-blocking).

Action items:

  1. Complete required PR template sections/fields.
  2. (Recommended) Link this PR to one active Linear issue key (RMN-xxx/CDV-xxx/COM-xxx) for traceability.
  3. Remove tabs, trailing whitespace, and merge conflict markers from added lines.
  4. Re-run local checks before pushing:
    • ./scripts/ci/rust_quality_gate.sh
    • ./scripts/ci/rust_strict_delta_gate.sh
    • ./scripts/ci/docs_quality_gate.sh

Detected Linear keys: none

Run logs: https://github.com/zeroclaw-labs/zeroclaw/actions/runs/22705515205

Detected blocking line issues (sample):

  • none

Detected advisory line issues (sample):

  • none

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

transcribe_audio now accepts a &TranscriptionConfig (with optional api_key), detects Mistral hosts, resolves the API key from config or provider-specific env vars, and sends requests via a provider-specific runtime proxy client instead of the hard-coded Groq client.

Changes

Cohort / File(s) Summary
Transcription logic
src/channels/transcription.rs
Adds fn is_mistral_host(api_url: &str) -> bool; changes transcribe_audio signature to accept &TranscriptionConfig and return Result<String>; implements provider-aware api_key resolution (config first, then MISTRAL_API_KEY or GROQ_API_KEY env fallback); selects proxy (transcription.mistral or transcription.groq) and uses build_runtime_proxy_client(proxy_name); updates error messaging and retains payload/response handling.
Configuration schema
src/config/schema.rs
Adds pub api_key: Option<String> to TranscriptionConfig; Default sets api_key: None; Debug impl redacts the api_key; adds "transcription.mistral" to SUPPORTED_PROXY_SERVICE_KEYS; encrypts/decrypts transcription.api_key during config save/load.
Tests
src/channels/...tests, tests/...
Adds tests for key-resolution (config key present, whitespace trimming, env fallback), is_mistral_host variants, and updates existing MIME/filename and rejection tests to follow new proxy/key resolution paths.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant TranscriptionModule
    participant ProxyBuilder
    participant ProxyService

    Caller->>TranscriptionModule: transcribe_audio(audio, filename, &config)
    TranscriptionModule->>TranscriptionModule: is_mistral_host(config.api_url)
    TranscriptionModule->>TranscriptionModule: resolve api_key (config.api_key or env by host)
    TranscriptionModule->>ProxyBuilder: build_runtime_proxy_client(proxy_name)
    ProxyBuilder->>ProxyService: create client for proxy_name
    TranscriptionModule->>ProxyService: send transcription request (api_key, payload)
    ProxyService-->>TranscriptionModule: return transcription response
    TranscriptionModule-->>Caller: return transcript / error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested labels

channel: transcription

Suggested reviewers

  • theonlyhennygod
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature change: adding Mistral Voxtral support for voice transcription, which aligns with the primary code changes to enable multi-provider STT support.
Description check ✅ Passed The description comprehensively covers all required template sections including problem statement, what changed, validation evidence, security impact, compatibility, and rollback plan with proper detail and structure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size: XS Auto size: <=80 non-doc changed lines. risk: medium Auto risk: src/** or dependency/config changes. provider Auto scope: src/providers/** changed. config: core Auto module: config core files changed. config Auto scope: src/config/** changed. and removed config Auto scope: src/config/** changed. provider Auto scope: src/providers/** changed. labels Mar 4, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/config/schema.rs (1)

493-503: Strengthen schema-contract docs for transcription provider expansion.

The updated comments improve clarity, but this config-surface change should explicitly document default behavior, compatibility impact, and migration/rollback guidance in-place.

📝 Suggested doc update
-/// Voice transcription configuration (Whisper API via Groq, Mistral, etc.).
+/// Voice transcription configuration (Groq Whisper default; also supports Mistral Voxtral endpoints).
+/// Default endpoint/model remain Groq:
+/// - api_url: `https://api.groq.com/openai/v1/audio/transcriptions`
+/// - model: `whisper-large-v3-turbo`
+/// Compatibility: additive and backward-compatible for existing Groq configs.
+/// Migration: set `transcription.api_url` to the Mistral endpoint and provide `api_key` or `MISTRAL_API_KEY`.
+/// Rollback: revert `transcription.api_url`/model to Groq defaults.
 pub struct TranscriptionConfig {
@@
-    /// Whisper or Voxtral model name (e.g. `whisper-large-v3-turbo`, `voxtral-mini-latest`).
+    /// Whisper or Voxtral model name (e.g. `whisper-large-v3-turbo`, `voxtral-mini-latest`).

As per coding guidelines, "Treat config keys as public contract: document defaults, compatibility impact, and migration/rollback path for schema changes in src/config/schema.rs."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 493 - 503, Update the doc comments on the
TranscriptionConfig struct and its fields (TranscriptionConfig, enabled,
api_url, default_transcription_api_url, default_transcription_model) to
explicitly state the default behavior (what enabled=false means and the concrete
default values returned by
default_transcription_api_url/default_transcription_model), note compatibility
impact when adding new transcription providers (which consumers may need to
configure or may differ in supported models/encoding), and provide a short
migration/rollback guidance line (how to revert to previous behavior by
restoring prior config keys or setting enabled=false and specifying prior
api_url/model); keep the wording concise and place these notes adjacent to the
existing field docs so the schema file serves as the public contract.
src/channels/transcription.rs (1)

62-80: Add unit tests for the new provider-branch key resolution paths.

This block now has provider-dependent behavior and whitespace filtering, but current tests only cover the legacy GROQ-missing-key path. Please add focused tests for Mistral URL + MISTRAL_API_KEY, Mistral URL without key, non-Mistral URL + GROQ_API_KEY, and whitespace-only keys.

As per coding guidelines, "Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test before PR submission."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 62 - 80, Add unit tests for the
provider-dependent API key resolution in transcription.rs that exercise the
api_key computation: create tests that set config.api_url to a Mistral URL and
verify behavior when MISTRAL_API_KEY is present (valid key) and when it is
absent (error path), create a test for a non-Mistral URL that uses GROQ_API_KEY
from the environment, and add a test that supplies whitespace-only keys (both
via config.api_key and env vars) to ensure trimming/filtering removes them; use
the same module/constructor that computes api_key (the logic referencing
config.api_key and config.api_url and environment vars
MISTRAL_API_KEY/GROQ_API_KEY) and assert success or the expected context error,
then run cargo fmt, cargo clippy, and cargo test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/transcription.rs`:
- Around line 69-74: The provider detection using
config.api_url.contains("mistral.ai") is fragile; parse config.api_url once
(e.g., with Url::parse) and inspect the host component (case-insensitive) to
decide provider, then reuse that boolean for both env-var selection and proxy
routing logic (replace the contains checks in the block around config.api_url
and the similar checks at the second occurrence). Ensure you handle parse errors
(fallback to explicit default) and match either exact host or valid subdomains
(e.g., host.ends_with("mistral.ai")) so MISTRAL_API_KEY is chosen only for true
Mistral hosts and GROQ_API_KEY otherwise.

---

Nitpick comments:
In `@src/channels/transcription.rs`:
- Around line 62-80: Add unit tests for the provider-dependent API key
resolution in transcription.rs that exercise the api_key computation: create
tests that set config.api_url to a Mistral URL and verify behavior when
MISTRAL_API_KEY is present (valid key) and when it is absent (error path),
create a test for a non-Mistral URL that uses GROQ_API_KEY from the environment,
and add a test that supplies whitespace-only keys (both via config.api_key and
env vars) to ensure trimming/filtering removes them; use the same
module/constructor that computes api_key (the logic referencing config.api_key
and config.api_url and environment vars MISTRAL_API_KEY/GROQ_API_KEY) and assert
success or the expected context error, then run cargo fmt, cargo clippy, and
cargo test.

In `@src/config/schema.rs`:
- Around line 493-503: Update the doc comments on the TranscriptionConfig struct
and its fields (TranscriptionConfig, enabled, api_url,
default_transcription_api_url, default_transcription_model) to explicitly state
the default behavior (what enabled=false means and the concrete default values
returned by default_transcription_api_url/default_transcription_model), note
compatibility impact when adding new transcription providers (which consumers
may need to configure or may differ in supported models/encoding), and provide a
short migration/rollback guidance line (how to revert to previous behavior by
restoring prior config keys or setting enabled=false and specifying prior
api_url/model); keep the wording concise and place these notes adjacent to the
existing field docs so the schema file serves as the public contract.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c7f77d99-06d1-4b49-b209-56c0a672d4e5

📥 Commits

Reviewing files that changed from the base of the PR and between a00ae63 and 102d548.

📒 Files selected for processing (2)
  • src/channels/transcription.rs
  • src/config/schema.rs

@github-actions github-actions bot added config Auto scope: src/config/** changed. size: S Auto size: 81-250 non-doc changed lines. provider Auto scope: src/providers/** changed. and removed provider Auto scope: src/providers/** changed. size: XS Auto size: <=80 non-doc changed lines. config Auto scope: src/config/** changed. labels Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/channels/transcription.rs (1)

172-241: ⚠️ Potential issue | 🟠 Major

Add test serialization or extract a pure key-resolution helper to eliminate environment variable leakage between tests.

Tests at lines 172–227 call std::env::remove_var() without restoration or serialization. When tests run in parallel, mutations to GROQ_API_KEY and MISTRAL_API_KEY can leak across test boundaries since transcribe_audio reads these variables at runtime (lines 97–99), not at config initialization. Either:

  • Use #[serial] or tokio::test(flavor = "multi_thread") with a global lock to serialize these four tests, or
  • Extract key resolution into a testable, dependency-injected function so tests can inject mock env lookups without mutating process state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 172 - 241, Tests mutate process
environment (GROQ_API_KEY/MISTRAL_API_KEY) causing cross-test leakage; fix by
either serializing the tests or extracting key-resolution into an injectable
helper. Option A: mark the tests that touch env vars as serialized (e.g., use a
global test lock or a serial attribute) so the four tests around
transcribe_audio do not run concurrently. Option B (preferred): extract the
environment lookup currently inside transcribe_audio (the logic that reads
GROQ_API_KEY / MISTRAL_API_KEY and whitespace-checks keys) into a pure function
like resolve_transcription_api_key(&TranscriptionConfig) -> Result<String,
Error> and update transcribe_audio to accept an injected key or resolver; update
tests to call the resolver directly or pass a mocked key so they no longer
mutate std::env.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/schema.rs`:
- Around line 518-523: Config now contains a sensitive field
config.transcription.api_key that is not being processed by the existing secrets
encryption/decryption paths; update Config::load_or_init to decrypt
transcription.api_key when secrets.encrypt is enabled and update Config::save to
encrypt transcription.api_key before persisting (and ensure the in-memory struct
stores the decrypted value after load). Locate the existing secret handling
logic in Config::load_or_init and Config::save and mirror that flow for
transcription.api_key, reusing the same helper functions or utilities used for
other secret fields so encryption and decryption are consistent with the
secrets.encrypt flag.

---

Outside diff comments:
In `@src/channels/transcription.rs`:
- Around line 172-241: Tests mutate process environment
(GROQ_API_KEY/MISTRAL_API_KEY) causing cross-test leakage; fix by either
serializing the tests or extracting key-resolution into an injectable helper.
Option A: mark the tests that touch env vars as serialized (e.g., use a global
test lock or a serial attribute) so the four tests around transcribe_audio do
not run concurrently. Option B (preferred): extract the environment lookup
currently inside transcribe_audio (the logic that reads GROQ_API_KEY /
MISTRAL_API_KEY and whitespace-checks keys) into a pure function like
resolve_transcription_api_key(&TranscriptionConfig) -> Result<String, Error> and
update transcribe_audio to accept an injected key or resolver; update tests to
call the resolver directly or pass a mocked key so they no longer mutate
std::env.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cf40fd03-c167-44ae-85e8-57bbc87fd3a8

📥 Commits

Reviewing files that changed from the base of the PR and between 102d548 and a6876e5.

📒 Files selected for processing (2)
  • src/channels/transcription.rs
  • src/config/schema.rs

@github-actions github-actions bot added config Auto scope: src/config/** changed. provider Auto scope: src/providers/** changed. and removed provider Auto scope: src/providers/** changed. config Auto scope: src/config/** changed. labels Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/config/schema.rs (1)

5519-5523: Nice fix wiring transcription.api_key into secrets lifecycle; please add regression assertions.

The load/save secret handling is correct now. I’d still add assertions in the existing config secret roundtrip test to explicitly cover transcription.api_key, so this path doesn’t regress silently.

Also applies to: 6358-6362

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 5519 - 5523, Add explicit assertions to
the existing config secrets roundtrip test to cover transcription.api_key: after
the test saves the config secrets, assert that the secrets store contains an
entry for "config.transcription.api_key" (or that it is present/absent as
expected), and after loading/decrypting the config assert that
config.transcription.api_key equals the original value (or Option state) used in
the test; reference the decrypt_optional_secret call and the
config.transcription.api_key field to locate where to add these checks so this
secret path is exercised and cannot regress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/schema.rs`:
- Around line 512-523: TranscriptionConfig currently derives Debug which will
print the sensitive api_key; remove Debug from the derive list and add a manual
impl fmt::Debug for TranscriptionConfig that prints enabled normally but redacts
or masks api_key (e.g., show "<redacted>" or "Some(****)" when
api_key.is_some()); keep Serialize, Deserialize, JsonSchema derives intact and
ensure the impl references the struct name TranscriptionConfig and its field
api_key so any debug logging no longer exposes the secret.

---

Nitpick comments:
In `@src/config/schema.rs`:
- Around line 5519-5523: Add explicit assertions to the existing config secrets
roundtrip test to cover transcription.api_key: after the test saves the config
secrets, assert that the secrets store contains an entry for
"config.transcription.api_key" (or that it is present/absent as expected), and
after loading/decrypting the config assert that config.transcription.api_key
equals the original value (or Option state) used in the test; reference the
decrypt_optional_secret call and the config.transcription.api_key field to
locate where to add these checks so this secret path is exercised and cannot
regress.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2fb448bd-a5d9-4722-b691-9121176e3de2

📥 Commits

Reviewing files that changed from the base of the PR and between a6876e5 and 08be53f.

📒 Files selected for processing (1)
  • src/config/schema.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config/schema.rs`:
- Around line 554-572: The Debug implementation for TranscriptionConfig
currently prints api_url verbatim; update the impl of std::fmt::Debug for
TranscriptionConfig (the fmt method) to redact or sanitize the api_url field
before logging (similar to api_key). Replace the direct .field("api_url",
&self.api_url) with a masked representation (e.g., None if empty, or a string
that strips credentials and query params or shows only the scheme+host or
"<redacted_url>") so no embedded credentials or tokenized query params are
emitted; keep the rest of the fields unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68417b91-09be-4ac4-b9ff-06eca09cd876

📥 Commits

Reviewing files that changed from the base of the PR and between 08be53f and 138240a.

📒 Files selected for processing (1)
  • src/config/schema.rs

Comment on lines +554 to +572
impl std::fmt::Debug for TranscriptionConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("TranscriptionConfig")
.field("enabled", &self.enabled)
.field(
"api_key",
&if self.api_key.is_some() {
Some("<redacted>")
} else {
None::<&str>
},
)
.field("api_url", &self.api_url)
.field("model", &self.model)
.field("language", &self.language)
.field("max_duration_secs", &self.max_duration_secs)
.finish()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact or sanitize api_url in Debug output.

api_key is redacted, but api_url is still logged verbatim. URLs can contain embedded credentials or tokenized query params, which risks secret leakage in logs.

🔒 Proposed fix
 impl std::fmt::Debug for TranscriptionConfig {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         f.debug_struct("TranscriptionConfig")
             .field("enabled", &self.enabled)
             .field(
                 "api_key",
                 &if self.api_key.is_some() {
                     Some("<redacted>")
                 } else {
                     None::<&str>
                 },
             )
-            .field("api_url", &self.api_url)
+            .field("api_url_configured", &!self.api_url.trim().is_empty())
             .field("model", &self.model)
             .field("language", &self.language)
             .field("max_duration_secs", &self.max_duration_secs)
             .finish()
     }
 }

Based on learnings: "Never log secrets, raw tokens, or sensitive payloads".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl std::fmt::Debug for TranscriptionConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("TranscriptionConfig")
.field("enabled", &self.enabled)
.field(
"api_key",
&if self.api_key.is_some() {
Some("<redacted>")
} else {
None::<&str>
},
)
.field("api_url", &self.api_url)
.field("model", &self.model)
.field("language", &self.language)
.field("max_duration_secs", &self.max_duration_secs)
.finish()
}
}
impl std::fmt::Debug for TranscriptionConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("TranscriptionConfig")
.field("enabled", &self.enabled)
.field(
"api_key",
&if self.api_key.is_some() {
Some("<redacted>")
} else {
None::<&str>
},
)
.field("api_url_configured", &!self.api_url.trim().is_empty())
.field("model", &self.model)
.field("language", &self.language)
.field("max_duration_secs", &self.max_duration_secs)
.finish()
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 554 - 572, The Debug implementation for
TranscriptionConfig currently prints api_url verbatim; update the impl of
std::fmt::Debug for TranscriptionConfig (the fmt method) to redact or sanitize
the api_url field before logging (similar to api_key). Replace the direct
.field("api_url", &self.api_url) with a masked representation (e.g., None if
empty, or a string that strips credentials and query params or shows only the
scheme+host or "<redacted_url>") so no embedded credentials or tokenized query
params are emitted; keep the rest of the fields unchanged.

- Replace fragile contains("mistral.ai") with proper URL host parsing
  via is_mistral_host() using reqwest::Url
- Add api_key field to TranscriptionConfig for explicit key configuration
- Enrich TranscriptionConfig docs with defaults, compatibility, migration
- Add 8 new unit tests: Mistral/Groq key resolution, whitespace
  filtering, URL host detection, and spoofed-path rejection
Add decrypt_optional_secret and encrypt_optional_secret calls for
config.transcription.api_key in Config::load_or_init and Config::save,
matching the pattern used by other sensitive credential fields.
@WAlexandreW WAlexandreW force-pushed the feat/mistral-transcription-support-v2 branch from 138240a to 5b05f2f Compare March 6, 2026 08:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/channels/transcription.rs (1)

172-241: ⚠️ Potential issue | 🟡 Minor

Add env synchronization guard to prevent test race condition.

Multiple tests call std::env::remove_var() without synchronization. Since cargo test runs tests in parallel by default, concurrent env mutations can interfere with each other's state, leading to flaky results.

The codebase already uses env_override_lock() in src/config/schema.rs for exactly this purpose (40+ tests rely on it). Wrap each env-dependent test with this existing pattern:

#[tokio::test]
async fn rejects_missing_api_key() {
    let _env_guard = env_override_lock().await;  // Add this
    std::env::remove_var("GROQ_API_KEY");
    // ... rest of test
}

First, export env_override_lock from src/config/schema.rs or define a local equivalent in the transcription test module. This ensures deterministic test behavior per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 172 - 241, Tests mutate
environment variables concurrently causing races; wrap each env-dependent test
(e.g., rejects_missing_api_key, uses_config_api_key_without_groq_env,
mistral_url_falls_back_to_mistral_env_key, whitespace_only_api_key_is_rejected)
with the existing env_override_lock() guard to serialize env changes, and ensure
env_override_lock is imported/exported into the transcription test module (or
provide a local equivalent) so transcribe_audio and TranscriptionConfig-based
tests use let _env_guard = env_override_lock().await before calling
std::env::remove_var or modifying config.
♻️ Duplicate comments (1)
src/config/schema.rs (1)

603-621: ⚠️ Potential issue | 🟠 Major

Sanitize api_url in Debug output to avoid secret leakage.

At Line 615, api_url is logged verbatim. URLs can carry embedded credentials or tokenized query params.

🔒 Proposed fix
 impl std::fmt::Debug for TranscriptionConfig {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         f.debug_struct("TranscriptionConfig")
             .field("enabled", &self.enabled)
             .field(
                 "api_key",
                 &if self.api_key.is_some() {
                     Some("<redacted>")
                 } else {
                     None::<&str>
                 },
             )
-            .field("api_url", &self.api_url)
+            .field("api_url_configured", &!self.api_url.trim().is_empty())
             .field("model", &self.model)
             .field("language", &self.language)
             .field("max_duration_secs", &self.max_duration_secs)
             .finish()
     }
 }

Based on learnings: "Never log secrets, raw tokens, or sensitive payloads."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 603 - 621, The Debug impl for
TranscriptionConfig currently prints api_url verbatim in fmt (impl
std::fmt::Debug for TranscriptionConfig -> fn fmt), which can leak credentials
in userinfo or query params; change the api_url field rendering to a sanitized
representation: parse the stored api_url (e.g., with url::Url) and emit only
non-sensitive parts (scheme + host + optional path) or a fixed placeholder like
"<redacted>" when userinfo or query/fragment exist or parsing fails, similar to
how api_key is redacted, ensuring the field name "api_url" is present but its
value never contains credentials or query tokens.
🧹 Nitpick comments (1)
src/channels/transcription.rs (1)

102-104: Error message could be more precise about provider-specific env vars.

The message suggests setting either MISTRAL_API_KEY or GROQ_API_KEY, but the env-var fallback is provider-specific—only MISTRAL_API_KEY is checked for Mistral URLs, and only GROQ_API_KEY for others. A user with a Mistral URL who sets GROQ_API_KEY will still see this error.

Proposed fix for clearer error messaging
-        .context(
-            "Missing transcription API key: set [transcription].api_key, MISTRAL_API_KEY, or GROQ_API_KEY environment variable",
-        )?;
+        .with_context(|| {
+            let env_hint = if mistral { "MISTRAL_API_KEY" } else { "GROQ_API_KEY" };
+            format!(
+                "Missing transcription API key: set [transcription].api_key or {env_hint} environment variable"
+            )
+        })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 102 - 104, Update the error
context message to be provider-aware: instead of suggesting both MISTRAL_API_KEY
and GROQ_API_KEY unconditionally in the .context(...) call, detect the
transcription provider (e.g., inspect the transcription URL or the branch that
handles Mistral vs others) and produce a provider-specific message that tells
users which env var to set (for Mistral URLs recommend MISTRAL_API_KEY,
otherwise recommend GROQ_API_KEY). Locate the .context(...) invocation that
builds the "Missing transcription API key..." message and change it to provide
conditional text based on the provider detection logic used elsewhere in this
module (same code path that checks for Mistral URLs), ensuring the error
references the correct env var for the selected provider.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/channels/transcription.rs`:
- Around line 172-241: Tests mutate environment variables concurrently causing
races; wrap each env-dependent test (e.g., rejects_missing_api_key,
uses_config_api_key_without_groq_env, mistral_url_falls_back_to_mistral_env_key,
whitespace_only_api_key_is_rejected) with the existing env_override_lock() guard
to serialize env changes, and ensure env_override_lock is imported/exported into
the transcription test module (or provide a local equivalent) so
transcribe_audio and TranscriptionConfig-based tests use let _env_guard =
env_override_lock().await before calling std::env::remove_var or modifying
config.

---

Duplicate comments:
In `@src/config/schema.rs`:
- Around line 603-621: The Debug impl for TranscriptionConfig currently prints
api_url verbatim in fmt (impl std::fmt::Debug for TranscriptionConfig -> fn
fmt), which can leak credentials in userinfo or query params; change the api_url
field rendering to a sanitized representation: parse the stored api_url (e.g.,
with url::Url) and emit only non-sensitive parts (scheme + host + optional path)
or a fixed placeholder like "<redacted>" when userinfo or query/fragment exist
or parsing fails, similar to how api_key is redacted, ensuring the field name
"api_url" is present but its value never contains credentials or query tokens.

---

Nitpick comments:
In `@src/channels/transcription.rs`:
- Around line 102-104: Update the error context message to be provider-aware:
instead of suggesting both MISTRAL_API_KEY and GROQ_API_KEY unconditionally in
the .context(...) call, detect the transcription provider (e.g., inspect the
transcription URL or the branch that handles Mistral vs others) and produce a
provider-specific message that tells users which env var to set (for Mistral
URLs recommend MISTRAL_API_KEY, otherwise recommend GROQ_API_KEY). Locate the
.context(...) invocation that builds the "Missing transcription API key..."
message and change it to provide conditional text based on the provider
detection logic used elsewhere in this module (same code path that checks for
Mistral URLs), ensuring the error references the correct env var for the
selected provider.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c6f78bee-76c9-4846-a885-be05ca824fd2

📥 Commits

Reviewing files that changed from the base of the PR and between 138240a and 5b05f2f.

📒 Files selected for processing (2)
  • src/channels/transcription.rs
  • src/config/schema.rs

@willsarg willsarg changed the base branch from dev to master March 7, 2026 18:29
@WAlexandreW WAlexandreW changed the base branch from master to dev March 7, 2026 19:17
@WAlexandreW
Copy link
Author

@willsarg Hi ! why changing from dev to master please?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel Auto scope: src/channels/** changed. config: core Auto module: config core files changed. provider Auto scope: src/providers/** changed. risk: medium Auto risk: src/** or dependency/config changes. size: S Auto size: 81-250 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant