feat: global whisper transcription provider and telegram voice fix#3029
feat: global whisper transcription provider and telegram voice fix#3029antonvice wants to merge 4 commits intozeroclaw-labs:masterfrom
Conversation
- Added `TranscriptionProvider` enum (Groq/Local) to global config - Implemented `whisper-rs` + `ffmpeg` orchestration in `transcription.rs` - Fixed Telegram voice detection bug when global provider is unset - Added interactive Transcription setup (Step 4) to onboarding wizard - Updated documentation for channels and config references
📝 WalkthroughWalkthroughAdds global multi-provider voice transcription (Groq Cloud and local Whisper‑RS), Telegram voice-message handling and builder options, onboarding wizard prompts for transcription, documentation and config schema updates, and an optional Changes
Sequence Diagram(s)sequenceDiagram
participant TG as Telegram Channel
participant Trans as Transcription Module
participant Groq as Groq API
participant Whisper as Local Whisper-RS
TG->>TG: Receive voice message
TG->>TG: Check voice_messages flag & duration
alt provider == groq
TG->>Trans: transcribe_audio(audio, file_name, config, workspace)
Trans->>Groq: Upload multipart request (audio + model/lang)
Groq-->>Trans: Transcript response
Trans-->>TG: Transcript text
else provider == local
TG->>Trans: transcribe_audio(audio, file_name, config, workspace)
Trans->>Whisper: write temp file, ensure model in workspace
Trans->>Whisper: run ffmpeg decode -> 16kHz mono f32le
Trans->>Whisper: transcribe_audio_whisper_rs(temp_path, model, workspace)
Whisper-->>Trans: Transcript text
Trans-->>TG: Transcript text
end
TG->>TG: Prepend prefix and embed transcript into message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/config/schema.rs (1)
5539-5556:⚠️ Potential issue | 🟠 MajorThis serde test asserts the opposite of its fixture.
The test constructs
interrupt_on_new_message: false, then assertstrueon Line 5555, so it will fail deterministically after round-trip serialization.Suggested fix
- assert!(parsed.interrupt_on_new_message); + assert!(!parsed.interrupt_on_new_message);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 5539 - 5556, The test for TelegramConfig has a mismatch: the fixture sets interrupt_on_new_message: false but the assertion checks true; update the assertion to match the fixture by asserting parsed.interrupt_on_new_message is false (e.g., use assert!(!parsed.interrupt_on_new_message)) or alternatively change the fixture to true—locate the TelegramConfig test in schema.rs and fix the check for interrupt_on_new_message to reflect the intended value after serde round-trip.
🧹 Nitpick comments (2)
src/config/schema.rs (1)
3637-3660: ReuseChannelsConfig::default()here.This duplicates the canonical default in two places, so every new channel field now has to be kept in sync manually. Delegating here avoids future drift.
Suggested refactor
- channels_config: ChannelsConfig { - cli: true, - telegram: None, - discord: None, - slack: None, - mattermost: None, - webhook: None, - imessage: None, - matrix: None, - signal: None, - whatsapp: None, - linq: None, - wati: None, - nextcloud_talk: None, - email: None, - irc: None, - lark: None, - feishu: None, - dingtalk: None, - qq: None, - nostr: None, - clawdtalk: None, - message_timeout_secs: default_channel_message_timeout_secs(), - }, + channels_config: ChannelsConfig::default(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 3637 - 3660, The explicit ChannelsConfig struct literal duplicates defaults; replace the manual construction at channels_config with ChannelsConfig::default() (or Default::default()) so the canonical defaults in ChannelsConfig::default() are reused; update the channels_config assignment to just ChannelsConfig::default() and remove the long list of per-channel fields (keeping any intentional overrides if present).src/channels/mod.rs (1)
2699-2702: Add a regression test forvoice_messages=truewithout global transcription config.This builder chain is the integration point for the PR’s main Telegram fix. A focused startup/collection test with
config.transcriptionunset would make future regressions here much easier to catch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/mod.rs` around lines 2699 - 2702, Add a regression test that starts the same builder chain but with config.transcription unset (None) while tg.voice_messages is true to ensure the integration path using .with_transcription(...).with_voice_messages(...).with_whisper_model(...) does not regress; specifically, create a test (e.g., test_voice_messages_without_global_transcription) that builds the config with transcription = None, sets tg.voice_messages = true, runs the channel startup/collection routine that invokes the builder chain, and assert startup completes without panic and the resulting channel/client state reflects no global transcription (or that whisper_model-based handling is used) — target the builder methods .with_transcription, .with_voice_messages, and .with_whisper_model when locating the code to exercise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Line 33: Replace the incorrect ignore pattern ./vscode with the standard
.vscode/ entry in .gitignore: locate the current ./vscode line and update it to
.vscode/ so the hidden VS Code workspace folder is properly ignored, matching
other hidden patterns like .venv/ and .worktrees/.
In `@Cargo.toml`:
- Line 153: Gate the whisper-rs dependency behind an opt-in Cargo feature by
making the dependency optional and exposing a feature (e.g. "whisper") that
enables it: mark the existing whisper-rs dependency as optional and add a
corresponding entry in the [features] table that maps "whisper" to the
"whisper-rs" dependency; update any code that assumes whisper-rs is always
present to be conditionally compiled with cfg(feature = "whisper") (search for
uses of whisper-rs types/funcs to wrap with cfg attributes).
In `@docs/channels-reference.md`:
- Line 165: Remove the trailing space at the end of the "Voice Transcription"
sentence ("**Voice Transcription**: Telegram voice notes (`.ogg`) can be
transcribed using the global transcription service (configured in the
`[transcription]` section).") so the line ends immediately after the period;
locate the line containing the "**Voice Transcription**" heading and trim the
trailing whitespace.
In `@src/channels/transcription.rs`:
- Around line 154-163: The download branch can race when two tasks both see
ggml-tiny.bin missing; make the download atomic: download into a temp file in
the same directory (e.g., derive temp path from default_model like
default_model.with_extension("tmp" or ".part")), write the bytes with
tokio::fs::write to that temp path, then atomically rename it into place with
tokio::fs::rename(temp_path, default_model). Ensure the temp file is created in
models_dir (so rename is atomic), and if rename fails because the target already
exists, treat it as success (remove the temp file) so concurrent writers don't
clobber a finished file; keep using reqwest::get and the existing bytes handling
but replace direct write to default_model with temp-write+rename and an
existence re-check around default_model.exists().
- Around line 146-149: The helper transcribe_audio_whisper_rs currently accepts
only model_path and workspace_dir so any transcription.language override is
silently ignored for TranscriptionProvider::Local; update the function (and any
call sites) to accept and propagate the optional language parameter (e.g., add a
language: Option<&str> argument) and pass it into the local whisper_rs
invocation, or if you prefer not to support language for local transcription,
explicitly return an error (bail!) when TranscriptionProvider::Local is selected
and transcription.language is set; make the change in
transcribe_audio_whisper_rs and corresponding branching logic where
TranscriptionProvider::Local is handled (also apply the same fix for the related
block around lines 197-205) so behavior is consistent and no overrides are
silently ignored.
- Around line 56-65: The code currently uses PathBuf::from(".") as workspace_dir
passed into transcribe_audio_whisper_rs, which makes model caching depend on the
process CWD; instead thread the real workspace path into transcribe_audio by
accepting and passing a workspace_path (or use an existing workspace field on
TranscriptionConfig) into transcribe_audio_whisper_rs (and its callers), and if
no workspace is available return an explicit error/bail! rather than defaulting
to "."; update references to transcribe_audio_whisper_rs, workspace_dir, and
config.whisper_model_path accordingly so callers supply or propagate the
workspace path and fail fast when it’s missing or not writable.
- Around line 46-70: The Local branch currently writes and processes unbounded
audio_data which can cause huge memory spikes; add an explicit size check before
writing/decoding: define a constant like MAX_LOCAL_AUDIO_BYTES and in the
TranscriptionProvider::Local arm check audio_data.len() <= MAX_LOCAL_AUDIO_BYTES
and return an error (with context) if exceeded. Update the Local branch
(referencing variables/idents: TranscriptionProvider::Local, audio_data,
temp_path, transcribe_audio_whisper_rs) to perform this check early and avoid
writing/processing oversized uploads. Ensure the error returned is descriptive
so callers know the upload was too large.
In `@src/config/schema.rs`:
- Around line 2784-2789: Update the documentation comment for the
Telegram-specific config fields to explicitly explain how voice_messages and
whisper_model interact with the global transcription settings: state that
voice_messages enables Telegram-specific transcription, that whisper_model is
used only when transcription.provider == "local" (otherwise ignored), and that
whisper_model, when set, overrides the global transcription.whisper_model_path
for Telegram messages; also describe the default behavior (nil = fallback to
global path or auto-download), compatibility implications, and a brief migration
note for users who previously relied on global-only settings. Reference the
struct fields voice_messages and whisper_model and the global keys
transcription.provider and transcription.whisper_model_path when editing the
comment.
In `@src/cron/scheduler.rs`:
- Around line 319-321: The TelegramChannel built in the cron scheduler omits
transcription and workspace_dir causing missing transcriptions and failing
attachment handling; update the builder chain that currently calls
.with_voice_messages(tg.voice_messages) and
.with_whisper_model(tg.whisper_model.clone()) to also call
.with_transcription(config.transcription.clone()) and
.with_workspace_dir(config.workspace_dir.clone()) so it matches the runtime
initialization (look for the TelegramChannel builder in scheduler.rs and add the
two methods referencing config.transcription and config.workspace_dir), ensuring
try_parse_attachment_message and send_attachment behavior is preserved.
In `@src/onboard/wizard.rs`:
- Around line 5127-5138: Validate the entered GGML model path immediately in the
wizard: after reading the path string in the custom_model branch (the local
variable path) trim it and use std::path::Path::new(&path_trimmed).is_file() to
confirm it exists and is a regular file before assigning to
config.whisper_model_path; if the check fails, return an explicit error (bail!
or Err) with a clear message or re-prompt the user so typos are fixed
immediately rather than persisting an invalid path.
---
Outside diff comments:
In `@src/config/schema.rs`:
- Around line 5539-5556: The test for TelegramConfig has a mismatch: the fixture
sets interrupt_on_new_message: false but the assertion checks true; update the
assertion to match the fixture by asserting parsed.interrupt_on_new_message is
false (e.g., use assert!(!parsed.interrupt_on_new_message)) or alternatively
change the fixture to true—locate the TelegramConfig test in schema.rs and fix
the check for interrupt_on_new_message to reflect the intended value after serde
round-trip.
---
Nitpick comments:
In `@src/channels/mod.rs`:
- Around line 2699-2702: Add a regression test that starts the same builder
chain but with config.transcription unset (None) while tg.voice_messages is true
to ensure the integration path using
.with_transcription(...).with_voice_messages(...).with_whisper_model(...) does
not regress; specifically, create a test (e.g.,
test_voice_messages_without_global_transcription) that builds the config with
transcription = None, sets tg.voice_messages = true, runs the channel
startup/collection routine that invokes the builder chain, and assert startup
completes without panic and the resulting channel/client state reflects no
global transcription (or that whisper_model-based handling is used) — target the
builder methods .with_transcription, .with_voice_messages, and
.with_whisper_model when locating the code to exercise.
In `@src/config/schema.rs`:
- Around line 3637-3660: The explicit ChannelsConfig struct literal duplicates
defaults; replace the manual construction at channels_config with
ChannelsConfig::default() (or Default::default()) so the canonical defaults in
ChannelsConfig::default() are reused; update the channels_config assignment to
just ChannelsConfig::default() and remove the long list of per-channel fields
(keeping any intentional overrides if present).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d774892-3a93-4458-a1b7-f580e7e1dfa2
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
.gitignoreCargo.tomldocs/channels-reference.mddocs/config-reference.mdsrc/channels/mod.rssrc/channels/telegram.rssrc/channels/transcription.rssrc/config/schema.rssrc/cron/scheduler.rssrc/onboard/wizard.rs
Cargo.toml
Outdated
|
|
||
| # Serial port for peripheral communication (STM32, etc.) | ||
| tokio-serial = { version = "5", default-features = false, optional = true } | ||
| whisper-rs = "0.15.1" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== whisper-rs dependency declaration ==="
sed -n '145,210p' Cargo.toml
echo
echo "=== whisper-related call sites ==="
rg -n 'whisper-rs|whisper_rs|Whisper(Context|State)|transcribe.*whisper' Cargo.toml src
echo
echo "=== feature gates mentioning whisper/transcription ==="
rg -n 'cfg\(feature *= *"[^"]*"\)|\[features\]' Cargo.toml src | rg 'whisper|transcription|features'Repository: zeroclaw-labs/zeroclaw
Length of output: 5523
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Alternative transcription backends ==="
rg -n 'transcribe.*groq|groq.*transcrib|TranscriptionBackend|TranscriptionProvider' src --type rs -A 2 -B 2
echo
echo "=== Transcription module structure ==="
head -100 src/channels/transcription.rs
echo
echo "=== Check if whisper-rs is conditionally compiled ==="
rg -n 'cfg\(feature|#\[cfg' src/channels/transcription.rs
echo
echo "=== Check if transcription module itself is optional ==="
rg -n 'mod transcription|use.*transcription' src --type rsRepository: zeroclaw-labs/zeroclaw
Length of output: 152
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Alternative transcription backends ==="
rg -n 'transcribe.*groq|groq.*transcrib|TranscriptionBackend|TranscriptionProvider' src -A 2 -B 2
echo
echo "=== Transcription module structure ==="
head -100 src/channels/transcription.rs
echo
echo "=== Check if whisper-rs is conditionally compiled ==="
rg -n 'cfg\(feature|#\[cfg' src/channels/transcription.rs
echo
echo "=== Check if transcription module itself is optional ==="
rg -n 'mod transcription|use.*transcription' srcRepository: zeroclaw-labs/zeroclaw
Length of output: 7858
Gate whisper-rs behind an opt-in feature.
Users selecting Groq transcription don't need whisper-rs, yet it compiles unconditionally for all builds. This increases compile time and binary size. Other heavy integrations in this manifest (matrix-sdk, fantoccini, postgres, wa-rs, probe-rs) are already feature-gated; keeping whisper-rs unconditional is inconsistent.
Suggested manifest direction
-whisper-rs = "0.15.1"
+whisper-rs = { version = "0.15.1", optional = true } [features]
default = []
+transcription-local = ["dep:whisper-rs"]
hardware = ["nusb", "tokio-serial"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Cargo.toml` at line 153, Gate the whisper-rs dependency behind an opt-in
Cargo feature by making the dependency optional and exposing a feature (e.g.
"whisper") that enables it: mark the existing whisper-rs dependency as optional
and add a corresponding entry in the [features] table that maps "whisper" to the
"whisper-rs" dependency; update any code that assumes whisper-rs is always
present to be conditionally compiled with cfg(feature = "whisper") (search for
uses of whisper-rs types/funcs to wrap with cfg attributes).
8627b93 to
ddecf93
Compare
ddecf93 to
6d036cb
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (1)
src/channels/transcription.rs (1)
67-72:⚠️ Potential issue | 🟠 MajorLocal transcription still drops part of the public config.
The helper can accept
language, but this branch still passesNone, and the fallback local model is hardcoded toggml-tiny.bininstead of consultingconfig.model. That leavestranscription.languageandtranscription.modeleffectively Groq-only. Either wire those settings through forTranscriptionProvider::Localor fail fast when they are set.Based on learnings,
**/*.rs: Do not add new config keys, trait methods, feature flags, or workflow branches without a concrete accepted use case; keep unsupported paths explicit (error out) rather than adding partial fake support.Also applies to: 153-157, 166-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/transcription.rs` around lines 67 - 72, The Local transcription branch currently calls transcribe_audio_whisper_rs(&temp_path, config.whisper_model_path.as_deref(), workspace_path, None) which drops language and hardcodes/relies on a default model (ggml-tiny) instead of using config.model, leaving transcription.language and transcription.model unset for TranscriptionProvider::Local; either thread config.language and config.model into this call (pass the actual language and model path through transcribe_audio_whisper_rs) and update any code that records transcription.model/language, or refuse/early-error when those public config fields are set but unsupported in TranscriptionProvider::Local (e.g., return an explicit error from the Local branch). Ensure you reference transcribe_audio_whisper_rs, TranscriptionProvider::Local, config.model and config.whisper_model_path when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@check_output_vfix.txt`:
- Around line 1-36: Remove the committed build log file (check_output_vfix.txt)
from the PR and amend the commit so it is not tracked; then fix the actual code
errors referenced by the log: add or remove the unused/missing fields in the
structs and call sites — either add a boolean field voice_messages_enabled to
the TelegramChannel struct (so occurrences in src/channels/telegram.rs such as
the initializer that sets voice_messages_enabled: false, the usage in the
expression using self.voice_messages_enabled, and the setter that assigns
self.voice_messages_enabled = enabled will compile) or remove/update those
references to match the current TelegramChannel API, and likewise remove or add
allow_group_mentions on LarkConfig (or update the initializer in
src/onboard/wizard.rs to match LarkConfig’s current fields) so the code and
struct definitions are consistent.
In `@docs/channels-reference.md`:
- Around line 165-168: The docs note about Telegram voice transcription is
missing the actual config keys; update the Telegram section in
channels-reference.md to list the exact settings users must enable—mention the
voice_messages boolean and the whisper_model (or provider) config keys and show
where the global [transcription] section ties in (e.g., which provider to set
for Groq vs Local/Whisper-RS); add a short bullet under the Voice Transcription
paragraph that names voice_messages and whisper_model (and any provider key
expected by [transcription]) so readers can enable it directly from the example.
In `@src/channels/lark.rs`:
- Around line 377-379: from_lark_config currently delegates to from_config which
reads the legacy use_feishu flag and flips to Feishu; change from_lark_config to
build the Lark-specific channel without honoring use_feishu — i.e., stop
delegating to from_config and instead copy the small config-to-Lark translation
logic here (or call a helper that accepts an explicit use_feishu override) so
that from_lark_config always treats the config as Lark (force
use_feishu=false/ignore that field) while reusing any shared parsing utilities
only if extracted safely; update the implementation of from_lark_config (and any
helper called) to reference crate::config::schema::LarkConfig fields directly
and construct the Lark channel with Lark endpoints.
In `@src/channels/telegram.rs`:
- Around line 315-316: The struct that currently ends with whisper_model:
Option<String> is missing the voice_messages_enabled field referenced elsewhere;
add a new field voice_messages_enabled: bool to that same struct (the one
containing whisper_model) and, if the struct is deserialized from external data,
annotate it with #[serde(default)] so missing values default to false to avoid
runtime/compile issues; update any struct instantiations if they require an
explicit value.
In `@src/channels/transcription.rs`:
- Around line 169-174: The download currently calls reqwest::get directly (when
!default_model.exists()), bypassing proxy configuration; change this to build
and use the runtime proxy client via build_runtime_proxy_client(...) and then
call
client.get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin").await?
(instead of reqwest::get), preserving the existing error_for_status() and
bytes().await? handling so the download honors [proxy] scope; update the
surrounding code that checks default_model.exists() to use the new client and
ensure types match the existing response handling.
- Around line 44-49: The transcribe_audio signature now requires a
workspace_path (&std::path::Path) as the fourth argument, so update every call
to transcribe_audio (e.g., the failing callsite that currently passes three
args) to pass the actual workspace path reference used in that context (for
example &workspace.path(), &workspace_dir, or workspace_path_var.as_path()),
ensuring the passed value is a &Path; check all usages of transcribe_audio and
adjust their argument lists accordingly and their variable scope/imports so the
workspace path is available where the function is called.
In `@src/config/schema.rs`:
- Around line 3251-3253: Several places create FeishuConfig via manual struct
literals but don't include the new field `mention_only`, causing build failures;
update every `FeishuConfig { ... }` literal (including tests and any test
helpers/builders) to add `mention_only: false` (or the appropriate default) so
the struct literal compiles, and ensure any custom `Default` impl or constructor
for `FeishuConfig` is consistent with that default.
In `@src/onboard/wizard.rs`:
- Around line 3624-3628: The current println block guarded by the voice_messages
check hardcodes "whisper-rs" and ffmpeg details; update the messages printed in
the voice_messages branch (the println calls in src/onboard/wizard.rs that
reference voice_messages) to be provider-neutral — e.g., mention that
transcription will be performed by the selected provider, that local
transcription may download a model and may require system tools like ffmpeg, and
move any provider-specific instructions (like "whisper-rs" model download or
explicit ffmpeg installation steps) out of this branch so they are displayed
only after the user selects Groq vs Local in Step 4.
- Around line 5133-5143: The prompt path handling in the custom_model branch
(the Input prompt and subsequent check in wizard.rs) must expand a leading tilde
before validating; modify the block that reads path (the code around
Input::new(), trimmed, model_path, and config.whisper_model_path) to expand ~
(e.g., via shellexpand::tilde or by resolving home_dir) into an absolute
path/string, then validate Path::new(&expanded).is_file() and set
config.whisper_model_path to the expanded path (not the raw trimmed input); keep
the existing error message (Whisper model file not found) but show the expanded
path for clarity.
- Around line 5099-5114: The wizard currently always includes "Local
(Whisper-RS)" in the providers list even when the transcription-local feature is
not enabled; update the code that builds providers (the providers Vec used by
Select::new) to conditionally push the Local option only when the
transcription-local feature flag is enabled (cfg!(feature =
"transcription-local")), leaving Groq always present, and ensure the selection
mapping to TranscriptionProvider::Local only occurs if that option was included;
reference the providers Vec, Select::new(), and TranscriptionProvider::Local
when making the change.
---
Duplicate comments:
In `@src/channels/transcription.rs`:
- Around line 67-72: The Local transcription branch currently calls
transcribe_audio_whisper_rs(&temp_path, config.whisper_model_path.as_deref(),
workspace_path, None) which drops language and hardcodes/relies on a default
model (ggml-tiny) instead of using config.model, leaving transcription.language
and transcription.model unset for TranscriptionProvider::Local; either thread
config.language and config.model into this call (pass the actual language and
model path through transcribe_audio_whisper_rs) and update any code that records
transcription.model/language, or refuse/early-error when those public config
fields are set but unsupported in TranscriptionProvider::Local (e.g., return an
explicit error from the Local branch). Ensure you reference
transcribe_audio_whisper_rs, TranscriptionProvider::Local, config.model and
config.whisper_model_path when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 80f82da9-b901-47e0-982b-4a030e4ed289
📒 Files selected for processing (11)
.gitignoreCargo.tomlcheck_output_vfix.txtdocs/channels-reference.mdsrc/channels/lark.rssrc/channels/telegram.rssrc/channels/transcription.rssrc/config/schema.rssrc/cron/scheduler.rssrc/onboard/wizard.rssrc/security/landlock.rs
💤 Files with no reviewable changes (1)
- src/security/landlock.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- .gitignore
- Cargo.toml
check_output_vfix.txt
Outdated
| Checking zeroclaw v0.1.7 (/Users/antonvice/Documents/programming/zeroclaw) | ||
| error[E0609]: no field `voice_messages_enabled` on type `&TelegramChannel` | ||
| --> src/channels/telegram.rs:1132:28 | ||
| | | ||
| 1132 | let text = if self.voice_messages_enabled { | ||
| | ^^^^^^^^^^^^^^^^^^^^^^ unknown field | ||
| | | ||
| = note: available fields are: `bot_token`, `allowed_users`, `pairing`, `client`, `typing_handle` ... and 10 others | ||
|
|
||
| error[E0560]: struct `TelegramChannel` has no field named `voice_messages_enabled` | ||
| --> src/channels/telegram.rs:347:13 | ||
| | | ||
| 347 | voice_messages_enabled: false, | ||
| | ^^^^^^^^^^^^^^^^^^^^^^ `TelegramChannel` does not have this field | ||
| | | ||
| = note: all struct fields are already assigned | ||
|
|
||
| error[E0609]: no field `voice_messages_enabled` on type `TelegramChannel` | ||
| --> src/channels/telegram.rs:386:14 | ||
| | | ||
| 386 | self.voice_messages_enabled = enabled; | ||
| | ^^^^^^^^^^^^^^^^^^^^^^ unknown field | ||
| | | ||
| = note: available fields are: `bot_token`, `allowed_users`, `pairing`, `client`, `typing_handle` ... and 10 others | ||
|
|
||
| error[E0560]: struct `LarkConfig` has no field named `allow_group_mentions` | ||
| --> src/onboard/wizard.rs:4960:21 | ||
| | | ||
| 4960 | allow_group_mentions: true, | ||
| | ^^^^^^^^^^^^^^^^^^^^ `LarkConfig` does not have this field | ||
| | | ||
| = note: all struct fields are already assigned | ||
|
|
||
| Some errors have detailed explanations: E0560, E0609. | ||
| For more information about an error, try `rustc --explain E0560`. | ||
| error: could not compile `zeroclaw` (lib) due to 4 previous errors |
There was a problem hiding this comment.
Remove the committed build log.
This looks like a transient local cargo check capture rather than repository source. It will go stale immediately and also bakes a developer-specific filesystem path into the repo. Please drop the file from the PR and fix the referenced sources instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@check_output_vfix.txt` around lines 1 - 36, Remove the committed build log
file (check_output_vfix.txt) from the PR and amend the commit so it is not
tracked; then fix the actual code errors referenced by the log: add or remove
the unused/missing fields in the structs and call sites — either add a boolean
field voice_messages_enabled to the TelegramChannel struct (so occurrences in
src/channels/telegram.rs such as the initializer that sets
voice_messages_enabled: false, the usage in the expression using
self.voice_messages_enabled, and the setter that assigns
self.voice_messages_enabled = enabled will compile) or remove/update those
references to match the current TelegramChannel API, and likewise remove or add
allow_group_mentions on LarkConfig (or update the initializer in
src/onboard/wizard.rs to match LarkConfig’s current fields) so the code and
struct definitions are consistent.
| - **Voice Transcription**: Telegram voice notes (`.ogg`) can be transcribed using the global transcription service (configured in the `[transcription]` section). | ||
| - Supported providers: **Groq** (Cloud) and **Local** (Whisper-RS). | ||
| - Local transcription requires `ffmpeg` and automatically downloads a small model on first use. | ||
| - Transcribed text is prepended with `🎙️ [Voice transcription]:`. |
There was a problem hiding this comment.
Document the actual Telegram config keys here.
The note explains the feature, but this section still never names the voice_messages / whisper_model settings users need to turn on. Please add those knobs to the Telegram example or bullets so the per-channel reference stays actionable.
As per coding guidelines "When runtime surfaces change (CLI, config, provider, channel behavior), update related references in commands-reference.md, providers-reference.md, channels-reference.md, config-reference.md, runbook, troubleshooting."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/channels-reference.md` around lines 165 - 168, The docs note about
Telegram voice transcription is missing the actual config keys; update the
Telegram section in channels-reference.md to list the exact settings users must
enable—mention the voice_messages boolean and the whisper_model (or provider)
config keys and show where the global [transcription] section ties in (e.g.,
which provider to set for Groq vs Local/Whisper-RS); add a short bullet under
the Voice Transcription paragraph that names voice_messages and whisper_model
(and any provider key expected by [transcription]) so readers can enable it
directly from the example.
| pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self { | ||
| Self::from_config(config) | ||
| } |
There was a problem hiding this comment.
from_lark_config() still honors the legacy use_feishu flag.
Line 378 delegates to from_config(), so use_feishu = true will still flip this constructor over to Feishu endpoints. That breaks the dedicated Lark path and contradicts the lark_from_lark_config_ignores_legacy_feishu_flag test below.
🛠️ Suggested fix
pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self {
- Self::from_config(config)
+ let mut ch = Self::new_with_platform(
+ config.app_id.clone(),
+ config.app_secret.clone(),
+ config.verification_token.clone().unwrap_or_default(),
+ config.port,
+ config.allowed_users.clone(),
+ config.mention_only,
+ LarkPlatform::Lark,
+ );
+ ch.receive_mode = config.receive_mode.clone();
+ ch
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self { | |
| Self::from_config(config) | |
| } | |
| pub fn from_lark_config(config: &crate::config::schema::LarkConfig) -> Self { | |
| let mut ch = Self::new_with_platform( | |
| config.app_id.clone(), | |
| config.app_secret.clone(), | |
| config.verification_token.clone().unwrap_or_default(), | |
| config.port, | |
| config.allowed_users.clone(), | |
| config.mention_only, | |
| LarkPlatform::Lark, | |
| ); | |
| ch.receive_mode = config.receive_mode.clone(); | |
| ch | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/lark.rs` around lines 377 - 379, from_lark_config currently
delegates to from_config which reads the legacy use_feishu flag and flips to
Feishu; change from_lark_config to build the Lark-specific channel without
honoring use_feishu — i.e., stop delegating to from_config and instead copy the
small config-to-Lark translation logic here (or call a helper that accepts an
explicit use_feishu override) so that from_lark_config always treats the config
as Lark (force use_feishu=false/ignore that field) while reusing any shared
parsing utilities only if extracted safely; update the implementation of
from_lark_config (and any helper called) to reference
crate::config::schema::LarkConfig fields directly and construct the Lark channel
with Lark endpoints.
| whisper_model: Option<String>, | ||
| } |
There was a problem hiding this comment.
Missing voice_messages_enabled field in struct definition causes compilation failure.
The pipeline errors confirm that voice_messages_enabled is referenced at lines 347, 386, and 1132 but the field is not declared in the struct. Add the missing field.
🐛 Proposed fix
workspace_dir: Option<std::path::PathBuf>,
+ voice_messages_enabled: bool,
whisper_model: Option<String>,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| whisper_model: Option<String>, | |
| } | |
| workspace_dir: Option<std::path::PathBuf>, | |
| voice_messages_enabled: bool, | |
| whisper_model: Option<String>, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/telegram.rs` around lines 315 - 316, The struct that currently
ends with whisper_model: Option<String> is missing the voice_messages_enabled
field referenced elsewhere; add a new field voice_messages_enabled: bool to that
same struct (the one containing whisper_model) and, if the struct is
deserialized from external data, annotate it with #[serde(default)] so missing
values default to false to avoid runtime/compile issues; update any struct
instantiations if they require an explicit value.
| pub async fn transcribe_audio( | ||
| audio_data: Vec<u8>, | ||
| file_name: &str, | ||
| config: &TranscriptionConfig, | ||
| workspace_path: &std::path::Path, | ||
| ) -> Result<String> { |
There was a problem hiding this comment.
Update the remaining transcribe_audio call site(s) before merge.
This signature change is fine, but CI already reports a missing fourth argument at Line 4213, so the branch does not build yet. Every caller needs to pass the real workspace path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/transcription.rs` around lines 44 - 49, The transcribe_audio
signature now requires a workspace_path (&std::path::Path) as the fourth
argument, so update every call to transcribe_audio (e.g., the failing callsite
that currently passes three args) to pass the actual workspace path reference
used in that context (for example &workspace.path(), &workspace_dir, or
workspace_path_var.as_path()), ensuring the passed value is a &Path; check all
usages of transcribe_audio and adjust their argument lists accordingly and their
variable scope/imports so the workspace path is available where the function is
called.
| if !default_model.exists() { | ||
| tracing::info!("Downloading default whisper model (ggml-tiny.bin) to {:?}", default_model); | ||
| let response = reqwest::get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin") | ||
| .await? | ||
| .error_for_status()?; | ||
| let bytes = response.bytes().await?; |
There was a problem hiding this comment.
Route the default-model download through the runtime proxy client.
This new outbound path uses reqwest::get directly, so [proxy] with scope = "zeroclaw" or scope = "services" is ignored on first-run local transcription. Environments that need a proxy to reach Hugging Face will only fail on the local path unless this uses build_runtime_proxy_client(...) as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/channels/transcription.rs` around lines 169 - 174, The download currently
calls reqwest::get directly (when !default_model.exists()), bypassing proxy
configuration; change this to build and use the runtime proxy client via
build_runtime_proxy_client(...) and then call
client.get("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin").await?
(instead of reqwest::get), preserving the existing error_for_status() and
bytes().await? handling so the download honors [proxy] scope; update the
surrounding code that checks default_model.exists() to use the new client and
ensure types match the existing response handling.
| /// When true, only respond to messages that @-mention the bot in groups. | ||
| #[serde(default)] | ||
| pub mention_only: bool, |
There was a problem hiding this comment.
Update the in-tree FeishuConfig literals in the same change.
#[serde(default)] only covers deserialization. Manual FeishuConfig { ... } initializers still need the new field, and CI is already failing at Line 7421 and Line 7441 for that reason.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config/schema.rs` around lines 3251 - 3253, Several places create
FeishuConfig via manual struct literals but don't include the new field
`mention_only`, causing build failures; update every `FeishuConfig { ... }`
literal (including tests and any test helpers/builders) to add `mention_only:
false` (or the appropriate default) so the struct literal compiles, and ensure
any custom `Default` impl or constructor for `FeishuConfig` is consistent with
that default.
| if voice_messages { | ||
| println!(" {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan()); | ||
| println!(" It will download a small model (~75MB) automatically the first time."); | ||
| println!(" Please ensure `ffmpeg` is installed on this machine and available in PATH."); | ||
| } |
There was a problem hiding this comment.
Keep the Telegram voice prompt provider-neutral.
This copy hardcodes local whisper-rs behavior before Step 4 asks whether the user wants Groq or Local transcription, so Groq users get incorrect setup guidance here.
💡 Suggested copy change
if voice_messages {
- println!(" {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan());
- println!(" It will download a small model (~75MB) automatically the first time.");
- println!(" Please ensure `ffmpeg` is installed on this machine and available in PATH.");
+ println!(
+ " {} Voice message handling enabled.",
+ style("ℹ").cyan()
+ );
+ println!(" Configure the transcription provider in the next step.");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if voice_messages { | |
| println!(" {} ZeroClaw uses `whisper-rs` and `ffmpeg` to parse voice notes.", style("ℹ").cyan()); | |
| println!(" It will download a small model (~75MB) automatically the first time."); | |
| println!(" Please ensure `ffmpeg` is installed on this machine and available in PATH."); | |
| } | |
| if voice_messages { | |
| println!( | |
| " {} Voice message handling enabled.", | |
| style("ℹ").cyan() | |
| ); | |
| println!(" Configure the transcription provider in the next step."); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/onboard/wizard.rs` around lines 3624 - 3628, The current println block
guarded by the voice_messages check hardcodes "whisper-rs" and ffmpeg details;
update the messages printed in the voice_messages branch (the println calls in
src/onboard/wizard.rs that reference voice_messages) to be provider-neutral —
e.g., mention that transcription will be performed by the selected provider,
that local transcription may download a model and may require system tools like
ffmpeg, and move any provider-specific instructions (like "whisper-rs" model
download or explicit ffmpeg installation steps) out of this branch so they are
displayed only after the user selects Groq vs Local in Step 4.
| let providers = vec![ | ||
| "Groq (Cloud) — fastest, requires GROQ_API_KEY", | ||
| "Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg", | ||
| ]; | ||
|
|
||
| let provider_idx = Select::new() | ||
| .with_prompt(" Select transcription provider") | ||
| .items(&providers) | ||
| .default(0) | ||
| .interact()?; | ||
|
|
||
| let provider = if provider_idx == 0 { | ||
| TranscriptionProvider::Groq | ||
| } else { | ||
| TranscriptionProvider::Local | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Cargo feature wiring =="
rg -n 'transcription-local|whisper-rs|whisper_rs' Cargo.toml
echo
echo "== Wizard exposure =="
rg -n 'Local \(Whisper-RS\)|TranscriptionProvider::Local' src/onboard/wizard.rs
echo
echo "== Runtime / config gating =="
rg -nP '#\[cfg\(feature *= *"transcription-local"\)\]|cfg!\(feature *= *"transcription-local"\)|TranscriptionProvider::Local' src/channels src/config src/onboardRepository: zeroclaw-labs/zeroclaw
Length of output: 831
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== src/channels/transcription.rs context around line 54 =="
sed -n '40,70p' src/channels/transcription.rs
echo
echo "== src/channels/transcription.rs context around line 159 (cfg guard) =="
sed -n '150,170p' src/channels/transcription.rs
echo
echo "== Full match statement or enum definition for TranscriptionProvider =="
rg -n 'enum TranscriptionProvider' src/ -A 5Repository: zeroclaw-labs/zeroclaw
Length of output: 2942
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Full transcribe_audio_whisper_rs definition - check where cfg block ends =="
sed -n '145,210p' src/channels/transcription.rs
echo
echo "== Check if there's an else clause or fallback after the cfg block =="
sed -n '190,220p' src/channels/transcription.rsRepository: zeroclaw-labs/zeroclaw
Length of output: 4095
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find end of transcribe_audio_whisper_rs function =="
sed -n '145,300p' src/channels/transcription.rs | tail -100Repository: zeroclaw-labs/zeroclaw
Length of output: 3283
Gate the Local transcription option behind the transcription-local feature flag.
The wizard unconditionally offers Local (Whisper-RS) even on builds compiled without transcription-local. While the runtime has a defensive check that will fail with an error message, this creates a confusing UX where a user selects an unavailable option only to encounter an error later. Per coding guidelines, unsupported features should be kept explicit (error out) rather than offering partial support.
Suggested fix
- let providers = vec![
- "Groq (Cloud) — fastest, requires GROQ_API_KEY",
- "Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg",
- ];
+ let local_supported = cfg!(feature = "transcription-local");
+ let mut providers = vec!["Groq (Cloud) — fastest, requires GROQ_API_KEY"];
+ if local_supported {
+ providers.push("Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg");
+ }
let provider_idx = Select::new()
.with_prompt(" Select transcription provider")
.items(&providers)
.default(0)
.interact()?;
- let provider = if provider_idx == 0 {
- TranscriptionProvider::Groq
- } else {
- TranscriptionProvider::Local
- };
+ let provider = match (provider_idx, local_supported) {
+ (0, _) => TranscriptionProvider::Groq,
+ (1, true) => TranscriptionProvider::Local,
+ _ => bail!(
+ "Local transcription is not compiled into this build. Rebuild with --features transcription-local."
+ ),
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let providers = vec![ | |
| "Groq (Cloud) — fastest, requires GROQ_API_KEY", | |
| "Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg", | |
| ]; | |
| let provider_idx = Select::new() | |
| .with_prompt(" Select transcription provider") | |
| .items(&providers) | |
| .default(0) | |
| .interact()?; | |
| let provider = if provider_idx == 0 { | |
| TranscriptionProvider::Groq | |
| } else { | |
| TranscriptionProvider::Local | |
| }; | |
| let local_supported = cfg!(feature = "transcription-local"); | |
| let mut providers = vec!["Groq (Cloud) — fastest, requires GROQ_API_KEY"]; | |
| if local_supported { | |
| providers.push("Local (Whisper-RS) — private, runs on your CPU, requires ffmpeg"); | |
| } | |
| let provider_idx = Select::new() | |
| .with_prompt(" Select transcription provider") | |
| .items(&providers) | |
| .default(0) | |
| .interact()?; | |
| let provider = match (provider_idx, local_supported) { | |
| (0, _) => TranscriptionProvider::Groq, | |
| (1, true) => TranscriptionProvider::Local, | |
| _ => bail!( | |
| "Local transcription is not compiled into this build. Rebuild with --features transcription-local." | |
| ), | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/onboard/wizard.rs` around lines 5099 - 5114, The wizard currently always
includes "Local (Whisper-RS)" in the providers list even when the
transcription-local feature is not enabled; update the code that builds
providers (the providers Vec used by Select::new) to conditionally push the
Local option only when the transcription-local feature flag is enabled
(cfg!(feature = "transcription-local")), leaving Groq always present, and ensure
the selection mapping to TranscriptionProvider::Local only occurs if that option
was included; reference the providers Vec, Select::new(), and
TranscriptionProvider::Local when making the change.
| if custom_model { | ||
| let path: String = Input::new() | ||
| .with_prompt(" Path to GGML model file (.bin)") | ||
| .interact_text()?; | ||
| if !path.trim().is_empty() { | ||
| let trimmed = path.trim(); | ||
| let model_path = std::path::Path::new(trimmed); | ||
| if !model_path.is_file() { | ||
| bail!("Whisper model file not found: {}", model_path.display()); | ||
| } | ||
| config.whisper_model_path = Some(trimmed.to_string()); |
There was a problem hiding this comment.
Expand ~ before validating custom Whisper model paths.
Path::is_file() on the raw prompt value rejects valid home-relative inputs like ~/.cache/whisper/tiny.bin, so the wizard can fail on a path that actually exists.
🏠 Suggested fix
if custom_model {
let path: String = Input::new()
.with_prompt(" Path to GGML model file (.bin)")
.interact_text()?;
if !path.trim().is_empty() {
let trimmed = path.trim();
- let model_path = std::path::Path::new(trimmed);
+ let expanded = shellexpand::tilde(trimmed).to_string();
+ let model_path = Path::new(&expanded);
if !model_path.is_file() {
bail!("Whisper model file not found: {}", model_path.display());
}
- config.whisper_model_path = Some(trimmed.to_string());
+ config.whisper_model_path = Some(expanded);
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/onboard/wizard.rs` around lines 5133 - 5143, The prompt path handling in
the custom_model branch (the Input prompt and subsequent check in wizard.rs)
must expand a leading tilde before validating; modify the block that reads path
(the code around Input::new(), trimmed, model_path, and
config.whisper_model_path) to expand ~ (e.g., via shellexpand::tilde or by
resolving home_dir) into an absolute path/string, then validate
Path::new(&expanded).is_file() and set config.whisper_model_path to the expanded
path (not the raw trimmed input); keep the existing error message (Whisper model
file not found) but show the expanded path for clarity.
|
Good feature — Whisper transcription with proper feature gating ( Compile error: Wizard code references Everything else looks solid — temp file cleanup, max file size guards, backward-compatible serde defaults. Happy to approve once the wizard field is fixed. |
|
Fixed the compile errors flagged by @rikitrader:
Verified with |
Summary
This PR integrates
whisper-rsas a global transcription provider option and fixes a bug in Telegram voice message detection.Changes
TranscriptionProvider(Groq/Local) toTranscriptionConfig.whisper-rs+ffmpegorchestration for private, on-device transcription.channels-reference.mdandconfig-reference.md.Verified with
cargo checkand manual testing on Telegram.Summary by CodeRabbit
New Features
Documentation
Chores