Skip to content

feat: global whisper transcription provider and telegram voice fix#3029

Open
antonvice wants to merge 4 commits intozeroclaw-labs:masterfrom
antonvice:feat/global-whisper-transcription
Open

feat: global whisper transcription provider and telegram voice fix#3029
antonvice wants to merge 4 commits intozeroclaw-labs:masterfrom
antonvice:feat/global-whisper-transcription

Conversation

@antonvice
Copy link
Contributor

@antonvice antonvice commented Mar 8, 2026

Summary

This PR integrates whisper-rs as a global transcription provider option and fixes a bug in Telegram voice message detection.

Changes

  • Global Provider: Added TranscriptionProvider (Groq/Local) to TranscriptionConfig.
  • Local Engine: Implemented whisper-rs + ffmpeg orchestration for private, on-device transcription.
  • Telegram Fix: Ensured the bot reacts to voice messages even before global transcription is explicitly configured.
  • Onboarding: Added a dedicated Transcription setup step (Step 4) to the interactive wizard.
  • Docs: Updated channels-reference.md and config-reference.md.

Verified with cargo check and manual testing on Telegram.

Summary by CodeRabbit

  • New Features

    • Added voice transcription for Telegram with selectable providers (cloud or local) and per-channel enablement
    • Onboarding now includes transcription setup and Telegram voice-message preferences
    • Telegram channel gains a maximum voice duration safeguard; Feishu gains an option to restrict group mentions
  • Documentation

    • Added Transcription configuration docs and updated Telegram channel docs with usage and setup guidance
  • Chores

    • Ignore VS Code workspace files (.vscode/) in source control

- Added `TranscriptionProvider` enum (Groq/Local) to global config
- Implemented `whisper-rs` + `ffmpeg` orchestration in `transcription.rs`
- Fixed Telegram voice detection bug when global provider is unset
- Added interactive Transcription setup (Step 4) to onboarding wizard
- Updated documentation for channels and config references
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Adds global multi-provider voice transcription (Groq Cloud and local Whisper‑RS), Telegram voice-message handling and builder options, onboarding wizard prompts for transcription, documentation and config schema updates, and an optional whisper-rs dependency plus .vscode/ in .gitignore.

Changes

Cohort / File(s) Summary
Repo misc & deps
/.gitignore, Cargo.toml
Add .vscode/ ignore; add optional whisper-rs = "0.15.1" and transcription-local feature.
Docs
docs/channels-reference.md, docs/config-reference.md
New Transcription sections and Telegram voice-transcription docs; added config examples and notes about ffmpeg/model download.
Config schema
src/config/schema.rs
Add TranscriptionProvider enum, provider and whisper_model_path on TranscriptionConfig; add voice_messages and whisper_model to TelegramConfig; add mention_only to FeishuConfig.
Transcription core
src/channels/transcription.rs
Implement provider routing: transcribe_audio_groq, transcribe_audio_whisper_rs, workspace/model management, temp file and ffmpeg decoding; update transcribe_audio signature to accept workspace path.
Telegram channel
src/channels/telegram.rs, src/channels/mod.rs
Add fields for whisper model and voice-message flag; add builder methods with_voice_messages and with_whisper_model; enforce max duration fallback and route transcription to local whisper when enabled.
Channel wiring
src/cron/scheduler.rs
Chain Telegram construction with .with_voice_messages(...), .with_whisper_model(...), .with_transcription(...), and .with_workspace_dir(...).
Onboarding
src/onboard/wizard.rs
Add Transcription setup step and prompts (provider choice, ffmpeg notice, optional model path); wire selected transcription config into final generated Config.
Lark/Feishu channel refactor
src/channels/lark.rs
Add platform handling, platform-aware constructors, and safer payload text extraction.
Other
src/security/landlock.rs, various test updates, check_output_vfix.txt
Minor import removal; tests updated for new fields; check_output includes compilation-failure diagnostics for missing struct fields in some edits.

Sequence Diagram(s)

sequenceDiagram
    participant TG as Telegram Channel
    participant Trans as Transcription Module
    participant Groq as Groq API
    participant Whisper as Local Whisper-RS

    TG->>TG: Receive voice message
    TG->>TG: Check voice_messages flag & duration
    alt provider == groq
        TG->>Trans: transcribe_audio(audio, file_name, config, workspace)
        Trans->>Groq: Upload multipart request (audio + model/lang)
        Groq-->>Trans: Transcript response
        Trans-->>TG: Transcript text
    else provider == local
        TG->>Trans: transcribe_audio(audio, file_name, config, workspace)
        Trans->>Whisper: write temp file, ensure model in workspace
        Trans->>Whisper: run ffmpeg decode -> 16kHz mono f32le
        Trans->>Whisper: transcribe_audio_whisper_rs(temp_path, model, workspace)
        Whisper-->>Trans: Transcript text
        Trans-->>TG: Transcript text
    end
    TG->>TG: Prepend prefix and embed transcript into message
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

size: L, risk: medium, docs, channel, channel: transcription, provider: groq, config: core, onboard: wizard

Suggested reviewers

  • theonlyhennygod

"A rabbit hopped in with a tape and a grin,
I stitched tiny models where silence had been.
ffmpeg hummed, the transcript took flight,
Groq or local — the words came out right.
🐇✨"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides a concise summary of changes but is missing several required template sections: Label Snapshot, Change Metadata, Linked Issue, Validation Evidence, Security Impact, Privacy and Data Hygiene, Compatibility/Migration, i18n Follow-Through, Human Verification, Side Effects, and Rollback Plan. Complete the missing required sections from the description template: provide risk/size/scope labels, change type, linked issues, validation commands output, security/privacy assessments, compatibility details, i18n status, human verification steps, and rollback plan.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: adding a global whisper transcription provider and fixing a Telegram voice message bug, both of which are prominent in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 82.93% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/config/schema.rs (1)

5539-5556: ⚠️ Potential issue | 🟠 Major

This serde test asserts the opposite of its fixture.

The test constructs interrupt_on_new_message: false, then asserts true on Line 5555, so it will fail deterministically after round-trip serialization.

Suggested fix
-        assert!(parsed.interrupt_on_new_message);
+        assert!(!parsed.interrupt_on_new_message);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 5539 - 5556, The test for TelegramConfig
has a mismatch: the fixture sets interrupt_on_new_message: false but the
assertion checks true; update the assertion to match the fixture by asserting
parsed.interrupt_on_new_message is false (e.g., use
assert!(!parsed.interrupt_on_new_message)) or alternatively change the fixture
to true—locate the TelegramConfig test in schema.rs and fix the check for
interrupt_on_new_message to reflect the intended value after serde round-trip.
🧹 Nitpick comments (2)
src/config/schema.rs (1)

3637-3660: Reuse ChannelsConfig::default() here.

This duplicates the canonical default in two places, so every new channel field now has to be kept in sync manually. Delegating here avoids future drift.

Suggested refactor
-            channels_config: ChannelsConfig {
-                cli: true,
-                telegram: None,
-                discord: None,
-                slack: None,
-                mattermost: None,
-                webhook: None,
-                imessage: None,
-                matrix: None,
-                signal: None,
-                whatsapp: None,
-                linq: None,
-                wati: None,
-                nextcloud_talk: None,
-                email: None,
-                irc: None,
-                lark: None,
-                feishu: None,
-                dingtalk: None,
-                qq: None,
-                nostr: None,
-                clawdtalk: None,
-                message_timeout_secs: default_channel_message_timeout_secs(),
-            },
+            channels_config: ChannelsConfig::default(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 3637 - 3660, The explicit ChannelsConfig
struct literal duplicates defaults; replace the manual construction at
channels_config with ChannelsConfig::default() (or Default::default()) so the
canonical defaults in ChannelsConfig::default() are reused; update the
channels_config assignment to just ChannelsConfig::default() and remove the long
list of per-channel fields (keeping any intentional overrides if present).
src/channels/mod.rs (1)

2699-2702: Add a regression test for voice_messages=true without global transcription config.

This builder chain is the integration point for the PR’s main Telegram fix. A focused startup/collection test with config.transcription unset would make future regressions here much easier to catch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/mod.rs` around lines 2699 - 2702, Add a regression test that
starts the same builder chain but with config.transcription unset (None) while
tg.voice_messages is true to ensure the integration path using
.with_transcription(...).with_voice_messages(...).with_whisper_model(...) does
not regress; specifically, create a test (e.g.,
test_voice_messages_without_global_transcription) that builds the config with
transcription = None, sets tg.voice_messages = true, runs the channel
startup/collection routine that invokes the builder chain, and assert startup
completes without panic and the resulting channel/client state reflects no
global transcription (or that whisper_model-based handling is used) — target the
builder methods .with_transcription, .with_voice_messages, and
.with_whisper_model when locating the code to exercise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 33: Replace the incorrect ignore pattern ./vscode with the standard
.vscode/ entry in .gitignore: locate the current ./vscode line and update it to
.vscode/ so the hidden VS Code workspace folder is properly ignored, matching
other hidden patterns like .venv/ and .worktrees/.

In `@Cargo.toml`:
- Line 153: Gate the whisper-rs dependency behind an opt-in Cargo feature by
making the dependency optional and exposing a feature (e.g. "whisper") that
enables it: mark the existing whisper-rs dependency as optional and add a
corresponding entry in the [features] table that maps "whisper" to the
"whisper-rs" dependency; update any code that assumes whisper-rs is always
present to be conditionally compiled with cfg(feature = "whisper") (search for
uses of whisper-rs types/funcs to wrap with cfg attributes).

In `@docs/channels-reference.md`:
- Line 165: Remove the trailing space at the end of the "Voice Transcription"
sentence ("**Voice Transcription**: Telegram voice notes (`.ogg`) can be
transcribed using the global transcription service (configured in the
`[transcription]` section).") so the line ends immediately after the period;
locate the line containing the "**Voice Transcription**" heading and trim the
trailing whitespace.

In `@src/channels/transcription.rs`:
- Around line 154-163: The download branch can race when two tasks both see
ggml-tiny.bin missing; make the download atomic: download into a temp file in
the same directory (e.g., derive temp path from default_model like
default_model.with_extension("tmp" or ".part")), write the bytes with
tokio::fs::write to that temp path, then atomically rename it into place with
tokio::fs::rename(temp_path, default_model). Ensure the temp file is created in
models_dir (so rename is atomic), and if rename fails because the target already
exists, treat it as success (remove the temp file) so concurrent writers don't
clobber a finished file; keep using reqwest::get and the existing bytes handling
but replace direct write to default_model with temp-write+rename and an
existence re-check around default_model.exists().
- Around line 146-149: The helper transcribe_audio_whisper_rs currently accepts
only model_path and workspace_dir so any transcription.language override is
silently ignored for TranscriptionProvider::Local; update the function (and any
call sites) to accept and propagate the optional language parameter (e.g., add a
language: Option<&str> argument) and pass it into the local whisper_rs
invocation, or if you prefer not to support language for local transcription,
explicitly return an error (bail!) when TranscriptionProvider::Local is selected
and transcription.language is set; make the change in
transcribe_audio_whisper_rs and corresponding branching logic where
TranscriptionProvider::Local is handled (also apply the same fix for the related
block around lines 197-205) so behavior is consistent and no overrides are
silently ignored.
- Around line 56-65: The code currently uses PathBuf::from(".") as workspace_dir
passed into transcribe_audio_whisper_rs, which makes model caching depend on the
process CWD; instead thread the real workspace path into transcribe_audio by
accepting and passing a workspace_path (or use an existing workspace field on
TranscriptionConfig) into transcribe_audio_whisper_rs (and its callers), and if
no workspace is available return an explicit error/bail! rather than defaulting
to "."; update references to transcribe_audio_whisper_rs, workspace_dir, and
config.whisper_model_path accordingly so callers supply or propagate the
workspace path and fail fast when it’s missing or not writable.
- Around line 46-70: The Local branch currently writes and processes unbounded
audio_data which can cause huge memory spikes; add an explicit size check before
writing/decoding: define a constant like MAX_LOCAL_AUDIO_BYTES and in the
TranscriptionProvider::Local arm check audio_data.len() <= MAX_LOCAL_AUDIO_BYTES
and return an error (with context) if exceeded. Update the Local branch
(referencing variables/idents: TranscriptionProvider::Local, audio_data,
temp_path, transcribe_audio_whisper_rs) to perform this check early and avoid
writing/processing oversized uploads. Ensure the error returned is descriptive
so callers know the upload was too large.

In `@src/config/schema.rs`:
- Around line 2784-2789: Update the documentation comment for the
Telegram-specific config fields to explicitly explain how voice_messages and
whisper_model interact with the global transcription settings: state that
voice_messages enables Telegram-specific transcription, that whisper_model is
used only when transcription.provider == "local" (otherwise ignored), and that
whisper_model, when set, overrides the global transcription.whisper_model_path
for Telegram messages; also describe the default behavior (nil = fallback to
global path or auto-download), compatibility implications, and a brief migration
note for users who previously relied on global-only settings. Reference the
struct fields voice_messages and whisper_model and the global keys
transcription.provider and transcription.whisper_model_path when editing the
comment.

In `@src/cron/scheduler.rs`:
- Around line 319-321: The TelegramChannel built in the cron scheduler omits
transcription and workspace_dir causing missing transcriptions and failing
attachment handling; update the builder chain that currently calls
.with_voice_messages(tg.voice_messages) and
.with_whisper_model(tg.whisper_model.clone()) to also call
.with_transcription(config.transcription.clone()) and
.with_workspace_dir(config.workspace_dir.clone()) so it matches the runtime
initialization (look for the TelegramChannel builder in scheduler.rs and add the
two methods referencing config.transcription and config.workspace_dir), ensuring
try_parse_attachment_message and send_attachment behavior is preserved.

In `@src/onboard/wizard.rs`:
- Around line 5127-5138: Validate the entered GGML model path immediately in the
wizard: after reading the path string in the custom_model branch (the local
variable path) trim it and use std::path::Path::new(&path_trimmed).is_file() to
confirm it exists and is a regular file before assigning to
config.whisper_model_path; if the check fails, return an explicit error (bail!
or Err) with a clear message or re-prompt the user so typos are fixed
immediately rather than persisting an invalid path.

---

Outside diff comments:
In `@src/config/schema.rs`:
- Around line 5539-5556: The test for TelegramConfig has a mismatch: the fixture
sets interrupt_on_new_message: false but the assertion checks true; update the
assertion to match the fixture by asserting parsed.interrupt_on_new_message is
false (e.g., use assert!(!parsed.interrupt_on_new_message)) or alternatively
change the fixture to true—locate the TelegramConfig test in schema.rs and fix
the check for interrupt_on_new_message to reflect the intended value after serde
round-trip.

---

Nitpick comments:
In `@src/channels/mod.rs`:
- Around line 2699-2702: Add a regression test that starts the same builder
chain but with config.transcription unset (None) while tg.voice_messages is true
to ensure the integration path using
.with_transcription(...).with_voice_messages(...).with_whisper_model(...) does
not regress; specifically, create a test (e.g.,
test_voice_messages_without_global_transcription) that builds the config with
transcription = None, sets tg.voice_messages = true, runs the channel
startup/collection routine that invokes the builder chain, and assert startup
completes without panic and the resulting channel/client state reflects no
global transcription (or that whisper_model-based handling is used) — target the
builder methods .with_transcription, .with_voice_messages, and
.with_whisper_model when locating the code to exercise.

In `@src/config/schema.rs`:
- Around line 3637-3660: The explicit ChannelsConfig struct literal duplicates
defaults; replace the manual construction at channels_config with
ChannelsConfig::default() (or Default::default()) so the canonical defaults in
ChannelsConfig::default() are reused; update the channels_config assignment to
just ChannelsConfig::default() and remove the long list of per-channel fields
(keeping any intentional overrides if present).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d774892-3a93-4458-a1b7-f580e7e1dfa2

📥 Commits

Reviewing files that changed from the base of the PR and between f7fefd4 and 220fb3f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .gitignore
  • Cargo.toml
  • docs/channels-reference.md
  • docs/config-reference.md
  • src/channels/mod.rs
  • src/channels/telegram.rs
  • src/channels/transcription.rs
  • src/config/schema.rs
  • src/cron/scheduler.rs
  • src/onboard/wizard.rs

Cargo.toml Outdated

# Serial port for peripheral communication (STM32, etc.)
tokio-serial = { version = "5", default-features = false, optional = true }
whisper-rs = "0.15.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== whisper-rs dependency declaration ==="
sed -n '145,210p' Cargo.toml

echo
echo "=== whisper-related call sites ==="
rg -n 'whisper-rs|whisper_rs|Whisper(Context|State)|transcribe.*whisper' Cargo.toml src

echo
echo "=== feature gates mentioning whisper/transcription ==="
rg -n 'cfg\(feature *= *"[^"]*"\)|\[features\]' Cargo.toml src | rg 'whisper|transcription|features'

Repository: zeroclaw-labs/zeroclaw

Length of output: 5523


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Alternative transcription backends ==="
rg -n 'transcribe.*groq|groq.*transcrib|TranscriptionBackend|TranscriptionProvider' src --type rs -A 2 -B 2

echo
echo "=== Transcription module structure ==="
head -100 src/channels/transcription.rs

echo
echo "=== Check if whisper-rs is conditionally compiled ==="
rg -n 'cfg\(feature|#\[cfg' src/channels/transcription.rs

echo
echo "=== Check if transcription module itself is optional ==="
rg -n 'mod transcription|use.*transcription' src --type rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 152


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Alternative transcription backends ==="
rg -n 'transcribe.*groq|groq.*transcrib|TranscriptionBackend|TranscriptionProvider' src -A 2 -B 2

echo
echo "=== Transcription module structure ==="
head -100 src/channels/transcription.rs

echo
echo "=== Check if whisper-rs is conditionally compiled ==="
rg -n 'cfg\(feature|#\[cfg' src/channels/transcription.rs

echo
echo "=== Check if transcription module itself is optional ==="
rg -n 'mod transcription|use.*transcription' src

Repository: zeroclaw-labs/zeroclaw

Length of output: 7858


Gate whisper-rs behind an opt-in feature.

Users selecting Groq transcription don't need whisper-rs, yet it compiles unconditionally for all builds. This increases compile time and binary size. Other heavy integrations in this manifest (matrix-sdk, fantoccini, postgres, wa-rs, probe-rs) are already feature-gated; keeping whisper-rs unconditional is inconsistent.

Suggested manifest direction
-whisper-rs = "0.15.1"
+whisper-rs = { version = "0.15.1", optional = true }
 [features]
 default = []
+transcription-local = ["dep:whisper-rs"]
 hardware = ["nusb", "tokio-serial"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` at line 153, Gate the whisper-rs dependency behind an opt-in
Cargo feature by making the dependency optional and exposing a feature (e.g.
"whisper") that enables it: mark the existing whisper-rs dependency as optional
and add a corresponding entry in the [features] table that maps "whisper" to the
"whisper-rs" dependency; update any code that assumes whisper-rs is always
present to be conditionally compiled with cfg(feature = "whisper") (search for
uses of whisper-rs types/funcs to wrap with cfg attributes).

@antonvice antonvice force-pushed the feat/global-whisper-transcription branch 2 times, most recently from 8627b93 to ddecf93 Compare March 8, 2026 22:49
@antonvice antonvice force-pushed the feat/global-whisper-transcription branch from ddecf93 to 6d036cb Compare March 8, 2026 22:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
src/channels/transcription.rs (1)

67-72: ⚠️ Potential issue | 🟠 Major

Local transcription still drops part of the public config.

The helper can accept language, but this branch still passes None, and the fallback local model is hardcoded to ggml-tiny.bin instead of consulting config.model. That leaves transcription.language and transcription.model effectively Groq-only. Either wire those settings through for TranscriptionProvider::Local or fail fast when they are set.

Based on learnings, **/*.rs: Do not add new config keys, trait methods, feature flags, or workflow branches without a concrete accepted use case; keep unsupported paths explicit (error out) rather than adding partial fake support.

Also applies to: 153-157, 166-168

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 67 - 72, The Local transcription
branch currently calls transcribe_audio_whisper_rs(&temp_path,
config.whisper_model_path.as_deref(), workspace_path, None) which drops language
and hardcodes/relies on a default model (ggml-tiny) instead of using
config.model, leaving transcription.language and transcription.model unset for
TranscriptionProvider::Local; either thread config.language and config.model
into this call (pass the actual language and model path through
transcribe_audio_whisper_rs) and update any code that records
transcription.model/language, or refuse/early-error when those public config
fields are set but unsupported in TranscriptionProvider::Local (e.g., return an
explicit error from the Local branch). Ensure you reference
transcribe_audio_whisper_rs, TranscriptionProvider::Local, config.model and
config.whisper_model_path when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@check_output_vfix.txt`:
- Around line 1-36: Remove the committed build log file (check_output_vfix.txt)
from the PR and amend the commit so it is not tracked; then fix the actual code
errors referenced by the log: add or remove the unused/missing fields in the
structs and call sites — either add a boolean field voice_messages_enabled to
the TelegramChannel struct (so occurrences in src/channels/telegram.rs such as
the initializer that sets voice_messages_enabled: false, the usage in the
expression using self.voice_messages_enabled, and the setter that assigns
self.voice_messages_enabled = enabled will compile) or remove/update those
references to match the current TelegramChannel API, and likewise remove or add
allow_group_mentions on LarkConfig (or update the initializer in
src/onboard/wizard.rs to match LarkConfig’s current fields) so the code and
struct definitions are consistent.

In `@docs/channels-reference.md`:
- Around line 165-168: The docs note about Telegram voice transcription is
missing the actual config keys; update the Telegram section in
channels-reference.md to list the exact settings users must enable—mention the
voice_messages boolean and the whisper_model (or provider) config keys and show
where the global [transcription] section ties in (e.g., which provider to set
for Groq vs Local/Whisper-RS); add a short bullet under the Voice Transcription
paragraph that names voice_messages and whisper_model (and any provider key
expected by [transcription]) so readers can enable it directly from the example.

In `@src/channels/lark.rs`:
- Around line 377-379: from_lark_config currently delegates to from_config which
reads the legacy use_feishu flag and flips to Feishu; change from_lark_config to
build the Lark-specific channel without honoring use_feishu — i.e., stop
delegating to from_config and instead copy the small config-to-Lark translation
logic here (or call a helper that accepts an explicit use_feishu override) so
that from_lark_config always treats the config as Lark (force
use_feishu=false/ignore that field) while reusing any shared parsing utilities
only if extracted safely; update the implementation of from_lark_config (and any
helper called) to reference crate::config::schema::LarkConfig fields directly
and construct the Lark channel with Lark endpoints.

In `@src/channels/telegram.rs`:
- Around line 315-316: The struct that currently ends with whisper_model:
Option<String> is missing the voice_messages_enabled field referenced elsewhere;
add a new field voice_messages_enabled: bool to that same struct (the one
containing whisper_model) and, if the struct is deserialized from external data,
annotate it with #[serde(default)] so missing values default to false to avoid
runtime/compile issues; update any struct instantiations if they require an
explicit value.

In `@src/channels/transcription.rs`:
- Around line 169-174: The download currently calls reqwest::get directly (when
!default_model.exists()), bypassing proxy configuration; change this to build
and use the runtime proxy client via build_runtime_proxy_client(...) and then
call
client.get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin").await?
(instead of reqwest::get), preserving the existing error_for_status() and
bytes().await? handling so the download honors [proxy] scope; update the
surrounding code that checks default_model.exists() to use the new client and
ensure types match the existing response handling.
- Around line 44-49: The transcribe_audio signature now requires a
workspace_path (&std::path::Path) as the fourth argument, so update every call
to transcribe_audio (e.g., the failing callsite that currently passes three
args) to pass the actual workspace path reference used in that context (for
example &workspace.path(), &workspace_dir, or workspace_path_var.as_path()),
ensuring the passed value is a &Path; check all usages of transcribe_audio and
adjust their argument lists accordingly and their variable scope/imports so the
workspace path is available where the function is called.

In `@src/config/schema.rs`:
- Around line 3251-3253: Several places create FeishuConfig via manual struct
literals but don't include the new field `mention_only`, causing build failures;
update every `FeishuConfig { ... }` literal (including tests and any test
helpers/builders) to add `mention_only: false` (or the appropriate default) so
the struct literal compiles, and ensure any custom `Default` impl or constructor
for `FeishuConfig` is consistent with that default.

In `@src/onboard/wizard.rs`:
- Around line 3624-3628: The current println block guarded by the voice_messages
check hardcodes "whisper-rs" and ffmpeg details; update the messages printed in
the voice_messages branch (the println calls in src/onboard/wizard.rs that
reference voice_messages) to be provider-neutral — e.g., mention that
transcription will be performed by the selected provider, that local
transcription may download a model and may require system tools like ffmpeg, and
move any provider-specific instructions (like "whisper-rs" model download or
explicit ffmpeg installation steps) out of this branch so they are displayed
only after the user selects Groq vs Local in Step 4.
- Around line 5133-5143: The prompt path handling in the custom_model branch
(the Input prompt and subsequent check in wizard.rs) must expand a leading tilde
before validating; modify the block that reads path (the code around
Input::new(), trimmed, model_path, and config.whisper_model_path) to expand ~
(e.g., via shellexpand::tilde or by resolving home_dir) into an absolute
path/string, then validate Path::new(&expanded).is_file() and set
config.whisper_model_path to the expanded path (not the raw trimmed input); keep
the existing error message (Whisper model file not found) but show the expanded
path for clarity.
- Around line 5099-5114: The wizard currently always includes "Local
(Whisper-RS)" in the providers list even when the transcription-local feature is
not enabled; update the code that builds providers (the providers Vec used by
Select::new) to conditionally push the Local option only when the
transcription-local feature flag is enabled (cfg!(feature =
"transcription-local")), leaving Groq always present, and ensure the selection
mapping to TranscriptionProvider::Local only occurs if that option was included;
reference the providers Vec, Select::new(), and TranscriptionProvider::Local
when making the change.

---

Duplicate comments:
In `@src/channels/transcription.rs`:
- Around line 67-72: The Local transcription branch currently calls
transcribe_audio_whisper_rs(&temp_path, config.whisper_model_path.as_deref(),
workspace_path, None) which drops language and hardcodes/relies on a default
model (ggml-tiny) instead of using config.model, leaving transcription.language
and transcription.model unset for TranscriptionProvider::Local; either thread
config.language and config.model into this call (pass the actual language and
model path through transcribe_audio_whisper_rs) and update any code that records
transcription.model/language, or refuse/early-error when those public config
fields are set but unsupported in TranscriptionProvider::Local (e.g., return an
explicit error from the Local branch). Ensure you reference
transcribe_audio_whisper_rs, TranscriptionProvider::Local, config.model and
config.whisper_model_path when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80f82da9-b901-47e0-982b-4a030e4ed289

📥 Commits

Reviewing files that changed from the base of the PR and between 220fb3f and 6d036cb.

📒 Files selected for processing (11)
  • .gitignore
  • Cargo.toml
  • check_output_vfix.txt
  • docs/channels-reference.md
  • src/channels/lark.rs
  • src/channels/telegram.rs
  • src/channels/transcription.rs
  • src/config/schema.rs
  • src/cron/scheduler.rs
  • src/onboard/wizard.rs
  • src/security/landlock.rs
💤 Files with no reviewable changes (1)
  • src/security/landlock.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitignore
  • Cargo.toml

Comment on lines +1 to +36
Checking zeroclaw v0.1.7 (/Users/antonvice/Documents/programming/zeroclaw)
error[E0609]: no field `voice_messages_enabled` on type `&TelegramChannel`
--> src/channels/telegram.rs:1132:28
|
1132 | let text = if self.voice_messages_enabled {
| ^^^^^^^^^^^^^^^^^^^^^^ unknown field
|
= note: available fields are: `bot_token`, `allowed_users`, `pairing`, `client`, `typing_handle` ... and 10 others

error[E0560]: struct `TelegramChannel` has no field named `voice_messages_enabled`
--> src/channels/telegram.rs:347:13
|
347 | voice_messages_enabled: false,
| ^^^^^^^^^^^^^^^^^^^^^^ `TelegramChannel` does not have this field
|
= note: all struct fields are already assigned

error[E0609]: no field `voice_messages_enabled` on type `TelegramChannel`
--> src/channels/telegram.rs:386:14
|
386 | self.voice_messages_enabled = enabled;
| ^^^^^^^^^^^^^^^^^^^^^^ unknown field
|
= note: available fields are: `bot_token`, `allowed_users`, `pairing`, `client`, `typing_handle` ... and 10 others

error[E0560]: struct `LarkConfig` has no field named `allow_group_mentions`
--> src/onboard/wizard.rs:4960:21
|
4960 | allow_group_mentions: true,
| ^^^^^^^^^^^^^^^^^^^^ `LarkConfig` does not have this field
|
= note: all struct fields are already assigned

Some errors have detailed explanations: E0560, E0609.
For more information about an error, try `rustc --explain E0560`.
error: could not compile `zeroclaw` (lib) due to 4 previous errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the committed build log.

This looks like a transient local cargo check capture rather than repository source. It will go stale immediately and also bakes a developer-specific filesystem path into the repo. Please drop the file from the PR and fix the referenced sources instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@check_output_vfix.txt` around lines 1 - 36, Remove the committed build log
file (check_output_vfix.txt) from the PR and amend the commit so it is not
tracked; then fix the actual code errors referenced by the log: add or remove
the unused/missing fields in the structs and call sites — either add a boolean
field voice_messages_enabled to the TelegramChannel struct (so occurrences in
src/channels/telegram.rs such as the initializer that sets
voice_messages_enabled: false, the usage in the expression using
self.voice_messages_enabled, and the setter that assigns
self.voice_messages_enabled = enabled will compile) or remove/update those
references to match the current TelegramChannel API, and likewise remove or add
allow_group_mentions on LarkConfig (or update the initializer in
src/onboard/wizard.rs to match LarkConfig’s current fields) so the code and
struct definitions are consistent.

Comment on lines +165 to +168
- **Voice Transcription**: Telegram voice notes (`.ogg`) can be transcribed using the global transcription service (configured in the `[transcription]` section).
- Supported providers: **Groq** (Cloud) and **Local** (Whisper-RS).
- Local transcription requires `ffmpeg` and automatically downloads a small model on first use.
- Transcribed text is prepended with `🎙️ [Voice transcription]:`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the actual Telegram config keys here.

The note explains the feature, but this section still never names the voice_messages / whisper_model settings users need to turn on. Please add those knobs to the Telegram example or bullets so the per-channel reference stays actionable.

As per coding guidelines "When runtime surfaces change (CLI, config, provider, channel behavior), update related references in commands-reference.md, providers-reference.md, channels-reference.md, config-reference.md, runbook, troubleshooting."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/channels-reference.md` around lines 165 - 168, The docs note about
Telegram voice transcription is missing the actual config keys; update the
Telegram section in channels-reference.md to list the exact settings users must
enable—mention the voice_messages boolean and the whisper_model (or provider)
config keys and show where the global [transcription] section ties in (e.g.,
which provider to set for Groq vs Local/Whisper-RS); add a short bullet under
the Voice Transcription paragraph that names voice_messages and whisper_model
(and any provider key expected by [transcription]) so readers can enable it
directly from the example.

Comment on lines +377 to +379
pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self {
Self::from_config(config)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

from_lark_config() still honors the legacy use_feishu flag.

Line 378 delegates to from_config(), so use_feishu = true will still flip this constructor over to Feishu endpoints. That breaks the dedicated Lark path and contradicts the lark_from_lark_config_ignores_legacy_feishu_flag test below.

🛠️ Suggested fix
 pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self {
-        Self::from_config(config)
+        let mut ch = Self::new_with_platform(
+            config.app_id.clone(),
+            config.app_secret.clone(),
+            config.verification_token.clone().unwrap_or_default(),
+            config.port,
+            config.allowed_users.clone(),
+            config.mention_only,
+            LarkPlatform::Lark,
+        );
+        ch.receive_mode = config.receive_mode.clone();
+        ch
     }
As per coding guidelines "Duplicate small, local logic when it preserves clarity; extract shared utilities only after repeated, stable patterns (rule-of-three)."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self {
Self::from_config(config)
}
pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self {
let mut ch = Self::new_with_platform(
config.app_id.clone(),
config.app_secret.clone(),
config.verification_token.clone().unwrap_or_default(),
config.port,
config.allowed_users.clone(),
config.mention_only,
LarkPlatform::Lark,
);
ch.receive_mode = config.receive_mode.clone();
ch
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/lark.rs` around lines 377 - 379, from_lark_config currently
delegates to from_config which reads the legacy use_feishu flag and flips to
Feishu; change from_lark_config to build the Lark-specific channel without
honoring use_feishu — i.e., stop delegating to from_config and instead copy the
small config-to-Lark translation logic here (or call a helper that accepts an
explicit use_feishu override) so that from_lark_config always treats the config
as Lark (force use_feishu=false/ignore that field) while reusing any shared
parsing utilities only if extracted safely; update the implementation of
from_lark_config (and any helper called) to reference
crate::config::schema::LarkConfig fields directly and construct the Lark channel
with Lark endpoints.

Comment on lines +315 to 316
whisper_model: Option<String>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing voice_messages_enabled field in struct definition causes compilation failure.

The pipeline errors confirm that voice_messages_enabled is referenced at lines 347, 386, and 1132 but the field is not declared in the struct. Add the missing field.

🐛 Proposed fix
     workspace_dir: Option<std::path::PathBuf>,
+    voice_messages_enabled: bool,
     whisper_model: Option<String>,
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
whisper_model: Option<String>,
}
workspace_dir: Option<std::path::PathBuf>,
voice_messages_enabled: bool,
whisper_model: Option<String>,
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/telegram.rs` around lines 315 - 316, The struct that currently
ends with whisper_model: Option<String> is missing the voice_messages_enabled
field referenced elsewhere; add a new field voice_messages_enabled: bool to that
same struct (the one containing whisper_model) and, if the struct is
deserialized from external data, annotate it with #[serde(default)] so missing
values default to false to avoid runtime/compile issues; update any struct
instantiations if they require an explicit value.

Comment on lines 44 to +49
pub async fn transcribe_audio(
audio_data: Vec<u8>,
file_name: &str,
config: &TranscriptionConfig,
workspace_path: &std::path::Path,
) -> Result<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update the remaining transcribe_audio call site(s) before merge.

This signature change is fine, but CI already reports a missing fourth argument at Line 4213, so the branch does not build yet. Every caller needs to pass the real workspace path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 44 - 49, The transcribe_audio
signature now requires a workspace_path (&std::path::Path) as the fourth
argument, so update every call to transcribe_audio (e.g., the failing callsite
that currently passes three args) to pass the actual workspace path reference
used in that context (for example &workspace.path(), &workspace_dir, or
workspace_path_var.as_path()), ensuring the passed value is a &Path; check all
usages of transcribe_audio and adjust their argument lists accordingly and their
variable scope/imports so the workspace path is available where the function is
called.

Comment on lines +169 to +174
if !default_model.exists() {
tracing::info!("Downloading default whisper model (ggml-tiny.bin) to {:?}", default_model);
let response = reqwest::get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin")
.await?
.error_for_status()?;
let bytes = response.bytes().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Route the default-model download through the runtime proxy client.

This new outbound path uses reqwest::get directly, so [proxy] with scope = "zeroclaw" or scope = "services" is ignored on first-run local transcription. Environments that need a proxy to reach Hugging Face will only fail on the local path unless this uses build_runtime_proxy_client(...) as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/transcription.rs` around lines 169 - 174, The download currently
calls reqwest::get directly (when !default_model.exists()), bypassing proxy
configuration; change this to build and use the runtime proxy client via
build_runtime_proxy_client(...) and then call
client.get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin").await?
(instead of reqwest::get), preserving the existing error_for_status() and
bytes().await? handling so the download honors [proxy] scope; update the
surrounding code that checks default_model.exists() to use the new client and
ensure types match the existing response handling.

Comment on lines +3251 to +3253
/// When true, only respond to messages that @-mention the bot in groups.
#[serde(default)]
pub mention_only: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update the in-tree FeishuConfig literals in the same change.

#[serde(default)] only covers deserialization. Manual FeishuConfig { ... } initializers still need the new field, and CI is already failing at Line 7421 and Line 7441 for that reason.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/schema.rs` around lines 3251 - 3253, Several places create
FeishuConfig via manual struct literals but don't include the new field
`mention_only`, causing build failures; update every `FeishuConfig { ... }`
literal (including tests and any test helpers/builders) to add `mention_only:
false` (or the appropriate default) so the struct literal compiles, and ensure
any custom `Default` impl or constructor for `FeishuConfig` is consistent with
that default.

Comment on lines +3624 to +3628
if voice_messages {
println!(" {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan());
println!(" It will download a small model (~75MB) automatically the first time.");
println!(" Please ensure `ffmpeg` is installed on this machine and available in PATH.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the Telegram voice prompt provider-neutral.

This copy hardcodes local whisper-rs behavior before Step 4 asks whether the user wants Groq or Local transcription, so Groq users get incorrect setup guidance here.

💡 Suggested copy change
                 if voice_messages {
-                    println!("  {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan());
-                    println!("     It will download a small model (~75MB) automatically the first time.");
-                    println!("     Please ensure `ffmpeg` is installed on this machine and available in PATH.");
+                    println!(
+                        "  {} Voice message handling enabled.",
+                        style("ℹ").cyan()
+                    );
+                    println!("     Configure the transcription provider in the next step.");
                 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if voice_messages {
println!(" {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan());
println!(" It will download a small model (~75MB) automatically the first time.");
println!(" Please ensure `ffmpeg` is installed on this machine and available in PATH.");
}
if voice_messages {
println!(
" {} Voice message handling enabled.",
style("ℹ").cyan()
);
println!(" Configure the transcription provider in the next step.");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/onboard/wizard.rs` around lines 3624 - 3628, The current println block
guarded by the voice_messages check hardcodes "whisper-rs" and ffmpeg details;
update the messages printed in the voice_messages branch (the println calls in
src/onboard/wizard.rs that reference voice_messages) to be provider-neutral —
e.g., mention that transcription will be performed by the selected provider,
that local transcription may download a model and may require system tools like
ffmpeg, and move any provider-specific instructions (like "whisper-rs" model
download or explicit ffmpeg installation steps) out of this branch so they are
displayed only after the user selects Groq vs Local in Step 4.

Comment on lines +5099 to +5114
let providers = vec![
"Groq (Cloud) — fastest, requires GROQ_API_KEY",
"Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg",
];

let provider_idx = Select::new()
.with_prompt(" Select transcription provider")
.items(&providers)
.default(0)
.interact()?;

let provider = if provider_idx == 0 {
TranscriptionProvider::Groq
} else {
TranscriptionProvider::Local
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Cargo feature wiring =="
rg -n 'transcription-local|whisper-rs|whisper_rs' Cargo.toml

echo
echo "== Wizard exposure =="
rg -n 'Local \(Whisper-RS\)|TranscriptionProvider::Local' src/onboard/wizard.rs

echo
echo "== Runtime / config gating =="
rg -nP '#\[cfg\(feature *= *"transcription-local"\)\]|cfg!\(feature *= *"transcription-local"\)|TranscriptionProvider::Local' src/channels src/config src/onboard

Repository: zeroclaw-labs/zeroclaw

Length of output: 831


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== src/channels/transcription.rs context around line 54 =="
sed -n '40,70p' src/channels/transcription.rs

echo
echo "== src/channels/transcription.rs context around line 159 (cfg guard) =="
sed -n '150,170p' src/channels/transcription.rs

echo
echo "== Full match statement or enum definition for TranscriptionProvider =="
rg -n 'enum TranscriptionProvider' src/ -A 5

Repository: zeroclaw-labs/zeroclaw

Length of output: 2942


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Full transcribe_audio_whisper_rs definition - check where cfg block ends =="
sed -n '145,210p' src/channels/transcription.rs

echo
echo "== Check if there's an else clause or fallback after the cfg block =="
sed -n '190,220p' src/channels/transcription.rs

Repository: zeroclaw-labs/zeroclaw

Length of output: 4095


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find end of transcribe_audio_whisper_rs function =="
sed -n '145,300p' src/channels/transcription.rs | tail -100

Repository: zeroclaw-labs/zeroclaw

Length of output: 3283


Gate the Local transcription option behind the transcription-local feature flag.

The wizard unconditionally offers Local (Whisper-RS) even on builds compiled without transcription-local. While the runtime has a defensive check that will fail with an error message, this creates a confusing UX where a user selects an unavailable option only to encounter an error later. Per coding guidelines, unsupported features should be kept explicit (error out) rather than offering partial support.

Suggested fix
-    let providers = vec![
-        "Groq (Cloud) — fastest, requires GROQ_API_KEY",
-        "Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg",
-    ];
+    let local_supported = cfg!(feature = "transcription-local");
+    let mut providers = vec!["Groq (Cloud) — fastest, requires GROQ_API_KEY"];
+    if local_supported {
+        providers.push("Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg");
+    }

     let provider_idx = Select::new()
         .with_prompt("  Select transcription provider")
         .items(&providers)
         .default(0)
         .interact()?;

-    let provider = if provider_idx == 0 {
-        TranscriptionProvider::Groq
-    } else {
-        TranscriptionProvider::Local
-    };
+    let provider = match (provider_idx, local_supported) {
+        (0, _) => TranscriptionProvider::Groq,
+        (1, true) => TranscriptionProvider::Local,
+        _ => bail!(
+            "Local transcription is not compiled into this build. Rebuild with --features transcription-local."
+        ),
+    };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let providers = vec![
"Groq (Cloud) — fastest, requires GROQ_API_KEY",
"Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg",
];
let provider_idx = Select::new()
.with_prompt(" Select transcription provider")
.items(&providers)
.default(0)
.interact()?;
let provider = if provider_idx == 0 {
TranscriptionProvider::Groq
} else {
TranscriptionProvider::Local
};
let local_supported = cfg!(feature = "transcription-local");
let mut providers = vec!["Groq (Cloud) — fastest, requires GROQ_API_KEY"];
if local_supported {
providers.push("Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg");
}
let provider_idx = Select::new()
.with_prompt(" Select transcription provider")
.items(&providers)
.default(0)
.interact()?;
let provider = match (provider_idx, local_supported) {
(0, _) => TranscriptionProvider::Groq,
(1, true) => TranscriptionProvider::Local,
_ => bail!(
"Local transcription is not compiled into this build. Rebuild with --features transcription-local."
),
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/onboard/wizard.rs` around lines 5099 - 5114, The wizard currently always
includes "Local (Whisper-RS)" in the providers list even when the
transcription-local feature is not enabled; update the code that builds
providers (the providers Vec used by Select::new) to conditionally push the
Local option only when the transcription-local feature flag is enabled
(cfg!(feature = "transcription-local")), leaving Groq always present, and ensure
the selection mapping to TranscriptionProvider::Local only occurs if that option
was included; reference the providers Vec, Select::new(), and
TranscriptionProvider::Local when making the change.

Comment on lines +5133 to +5143
if custom_model {
let path: String = Input::new()
.with_prompt(" Path to GGML model file (.bin)")
.interact_text()?;
if !path.trim().is_empty() {
let trimmed = path.trim();
let model_path = std::path::Path::new(trimmed);
if !model_path.is_file() {
bail!("Whisper model file not found: {}", model_path.display());
}
config.whisper_model_path = Some(trimmed.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expand ~ before validating custom Whisper model paths.

Path::is_file() on the raw prompt value rejects valid home-relative inputs like ~/.cache/whisper/tiny.bin, so the wizard can fail on a path that actually exists.

🏠 Suggested fix
         if custom_model {
             let path: String = Input::new()
                 .with_prompt("  Path to GGML model file (.bin)")
                 .interact_text()?;
             if !path.trim().is_empty() {
                 let trimmed = path.trim();
-                let model_path = std::path::Path::new(trimmed);
+                let expanded = shellexpand::tilde(trimmed).to_string();
+                let model_path = Path::new(&expanded);
                 if !model_path.is_file() {
                     bail!("Whisper model file not found: {}", model_path.display());
                 }
-                config.whisper_model_path = Some(trimmed.to_string());
+                config.whisper_model_path = Some(expanded);
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/onboard/wizard.rs` around lines 5133 - 5143, The prompt path handling in
the custom_model branch (the Input prompt and subsequent check in wizard.rs)
must expand a leading tilde before validating; modify the block that reads path
(the code around Input::new(), trimmed, model_path, and
config.whisper_model_path) to expand ~ (e.g., via shellexpand::tilde or by
resolving home_dir) into an absolute path/string, then validate
Path::new(&expanded).is_file() and set config.whisper_model_path to the expanded
path (not the raw trimmed input); keep the existing error message (Whisper model
file not found) but show the expanded path for clarity.

@rikitrader
Copy link

Good feature — Whisper transcription with proper feature gating (transcription-local) and clean Telegram voice isolation. One blocking issue:

Compile error: Wizard code references allow_group_mentions field (around line 949 of diff) that doesn't exist in TelegramConfig schema. This will fail to compile. Please remove or fix that reference.

Everything else looks solid — temp file cleanup, max file size guards, backward-compatible serde defaults. Happy to approve once the wizard field is fixed.

@rareba
Copy link
Contributor

rareba commented Mar 9, 2026

Fixed the compile errors flagged by @rikitrader:

  1. Removed allow_group_mentions: true from the TelegramConfig struct literal in src/onboard/wizard.rs — this field does not exist in the TelegramConfig schema.

  2. Added missing voice_messages_enabled: bool field to the TelegramChannel struct in src/channels/telegram.rs — the constructor (line 347), setter with_voice_messages (line 386), and voice transcription logic (line 1132) all referenced this field, but it was missing from the struct definition.

Verified with cargo check — compiles cleanly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants