feat(whatsapp-web): add voice message transcription support by rareba · Pull Request #2920 · zeroclaw-labs/zeroclaw

rareba · 2026-03-06T12:17:09Z

Summary

Base branch target: master
Problem: WhatsApp voice notes (audio messages with ptt=true) were silently dropped because text_content() returns empty for audio messages, hitting the trimmed.is_empty() guard
Why it matters: Users sending voice messages via WhatsApp got no response from the agent
What changed: Added audio message detection, download via Client::download(), and transcription via the existing Whisper API pipeline (shared with Telegram channel). Wired TranscriptionConfig into WhatsAppWebChannel via builder pattern (matching Telegram channel's approach)
What did not change (scope boundary): No changes to Telegram transcription, no changes to audio format handling, no changes to existing text message flow

Files changed

src/channels/whatsapp_web.rs: Added transcription field, with_transcription() builder, audio message handling in Event::Message with duration limit, download, and transcription
src/channels/mod.rs: Wired .with_transcription(config.transcription.clone()) in WhatsApp Web factory

Label Snapshot (required)

Risk label: risk: medium
Size label: size: S
Scope labels: channel
Module labels: channel: whatsapp-web
Contributor tier label: (auto-managed)
If any auto-label is incorrect, note requested correction: N/A

Change Metadata

Change type: feature
Primary scope: channel

Linked Issue

Closes WhatsApp Web: support audio/media messages (voice notes, images, documents) #2918

Supersede Attribution (required when `Supersedes #` is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check   # passes
cargo clippy --features whatsapp-web   # no new warnings (3 pre-existing warnings unchanged)
cargo test   # all 14 whatsapp_web tests pass (12 existing + 2 new)

Evidence provided: unit test results
If any command is intentionally skipped, explain why: N/A

Security Impact (required)

New permissions/capabilities? No
New external network calls? No (reuses existing Whisper API pipeline)
Secrets/tokens handling changed? No
File system access scope changed? No

Privacy and Data Hygiene (required)

Data-hygiene status: pass
Redaction/anonymization notes: Audio data is transient, passed to existing transcription pipeline
Neutral wording confirmation: Confirmed

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No (reuses existing [transcription] config section)
Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? No — code changes only

Human Verification (required)

Verified scenarios: Unit tests for transcription config wiring, audio message detection
Edge cases checked: Empty audio, duration limit enforcement
What was not verified: Manual end-to-end test with live WhatsApp voice note (marked as TODO in original test plan)

Side Effects / Blast Radius (required)

Affected subsystems/workflows: WhatsApp Web channel message handling
Potential unintended effects: None — audio handling is additive, existing text flow unchanged
Guardrails/monitoring for early detection: Duration limit prevents processing excessively long audio

Agent Collaboration Notes (recommended)

Agent tools used: Claude Code
Workflow/plan summary: Followed existing Telegram channel transcription pattern
Verification focus: Builder pattern consistency, test coverage
Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

Fast rollback command/path: git revert <commit>
Feature flags or config toggles: Existing [transcription] enabled = true gates the feature
Observable failure symptoms: Voice notes silently dropped (returns to pre-feature behavior)

Risks and Mitigations

Risk: Audio download may fail for large files
- Mitigation: Duration limit enforced before download attempt

Summary by CodeRabbit

New Features
- Added voice transcription support for WhatsApp Web channel, enabling automatic transcription of audio messages to text.
- Added configurable transcription settings that can be applied when setting up the WhatsApp Web channel.
Tests
- Added unit tests to verify transcription configuration behavior.

coderabbitai · 2026-03-06T12:17:30Z

Note

`.coderabbit.yaml` has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)

Validation error: Unrecognized key(s) in object: 'tools', 'path_filters', 'review_instructions'

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

WhatsApp Web channel gains voice transcription support. The transcription configuration is bound to the channel at creation time via a builder method, enabling downloaded audio messages to be transcribed and forwarded as text instead of being silently dropped.

Changes

Cohort / File(s)	Summary
Channel Initialization `src/channels/mod.rs`	WhatsAppWebChannel construction now chains `with_transcription(config.transcription.clone())` to bind transcription settings at channel creation.
WhatsApp Web Transcription Support `src/channels/whatsapp_web.rs`	Adds optional `transcription` field to WhatsAppWebChannel struct. Introduces feature-gated `with_transcription()` builder method and updates event handling to download, transcribe, and forward audio messages as text. Includes unit tests for transcription configuration behavior.

Sequence Diagram

sequenceDiagram
    actor User
    participant WhatsAppWeb as WhatsApp Web<br/>Channel
    participant Download as Audio<br/>Download
    participant Transcribe as Transcription<br/>Service
    participant Bot as Bot/Agent

    User->>WhatsAppWeb: Send voice message
    WhatsAppWeb->>WhatsAppWeb: Check for audio content
    WhatsAppWeb->>Download: Download audio file
    Download-->>WhatsAppWeb: Return audio bytes
    WhatsAppWeb->>Transcribe: transcribe_audio(bytes, config)
    Transcribe-->>WhatsAppWeb: Return transcribed text
    WhatsAppWeb->>Bot: Send ChannelMessage(text)
    Bot-->>User: Process transcribed text

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(whatsapp-web): supersede #1992 transcription flow [RMN-205] #2192 — Both modify WhatsApp Web channel to add transcription config binding via with_transcription() in channel construction.
fix(discord): transcribe inbound audio attachments #2700 — Similar per-channel transcription support pattern added to Discord channel with identical builder method and configuration approach.

Suggested labels

size: M, risk: medium, channel

Suggested reviewers

theonlyhennygod

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main feature addition: adding voice message transcription support to the WhatsApp Web channel.
Description check	✅ Passed	The PR description covers the problem, solution, and changes made with a test plan, though it lacks several required template sections like risk labels, scope labels, and backward compatibility details.
Linked Issues check	✅ Passed	The PR fully addresses issue `#2918` requirements: voice notes are now detected, downloaded, and transcribed using the existing Whisper API, with transcription wired via builder pattern matching Telegram's approach.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#2918`: adding voice transcription support to WhatsApp Web. No unrelated or out-of-scope modifications are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

willsarg · 2026-03-07T19:30:28Z

Local validation complete on local-master-builder (starting from commit a6102f8, cumulative merge workflow). This PR merged cleanly in sequence and passed: cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings (no new warnings introduced), and cargo test. Marking this PR as validated and safe-to-merge from local integration testing.

rikitrader · 2026-03-09T03:12:52Z

This PR properly integrates transcription into WhatsApp Web via .with_transcription() builder pattern — good approach. However, it may conflict with #3029 which also modifies the transcription system. Please rebase onto main after #3029 is resolved and verify no merge conflicts in whatsapp_web.rs and channels/mod.rs.

rareba · 2026-03-09T20:59:20Z

Rebased onto current master. This PR now has a clean 2-file diff (src/channels/whatsapp_web.rs and src/channels/mod.rs).

Re: potential conflict with #3029 — both branches are now based on the same master commit. If #3029 merges first, we'll rebase to resolve any overlap in whatsapp_web.rs.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/channels/whatsapp_web.rs (1)
808-824: Please cover the handler path, not just the builder.

These tests only pin with_transcription(). The risky logic is in Event::Message—allowlist resolution, voice-note filtering, missing-duration handling, download/transcription failures, and successful forwarding—so this change still lacks direct coverage where regressions are most likely.

Based on learnings, "Applies to src/channels/**/*.rs : Implement Channel trait in src/channels/, keep send, listen, health_check, typing semantics consistent, cover auth/allowlist/health behavior with tests."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/whatsapp_web.rs` around lines 808 - 824, Add tests that exercise
the runtime handler path for message events instead of only the builder: write
unit tests that invoke the Channel implementation's event handling (the
Event::Message path) for the whatsapp_web channel created via make_channel()
with with_transcription(), covering allowlist resolution, voice-note filtering,
missing-duration handling, download/transcription failures, and the
successful-forwarding path; use the Channel trait methods
(send/listen/health_check/typing semantics) or the specific handler function
used by the whatsapp_web implementation to feed synthetic message events and
assert expected outcomes (transcription forwarded when enabled, ignored when
disabled or filtered, proper error handling/logging on download/transcription
failure, and health/auth behavior).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 377-381: The warn/info logs in the WhatsApp Web handler currently
include PII and message content (e.g., logging sender_jid, sender_candidates and
transcription text) — replace those raw values with redacted or categorical data
before logging: update the tracing::warn!/info! calls in the block that
references sender_jid and sender_candidates (and the similar calls around the
section at lines handling transcription bodies) to log only non-sensitive
indicators (e.g., "unknown-sender" or hash/boolean flags like
sender_in_allowed_list and transcription_truncated=true) or counts, remove
transcript body content entirely, and ensure any helper functions (e.g., the
code that builds sender_candidates or the transcription logging path) perform
the redaction so no raw PII or full message bodies are emitted to logs.
- Around line 394-425: The branch handling msg.get_base_message().audio_message
should only process true voice notes and must fail closed when duration is
missing: replace the current
audio.seconds.unwrap_or(0)/audio.ptt.unwrap_or(false) behavior by first checking
audio.ptt == Some(true) and returning/ignoring when not a PTT, then require
audio.seconds to be Some(duration) and log/warn+return if missing (do not treat
as 0), then compare that duration against transcription_config.max_duration_secs
before calling _client.download; update the tracing messages accordingly and use
early returns on unsupported states so callers never download unknown-size
media.

---

Nitpick comments:
In `@src/channels/whatsapp_web.rs`:
- Around line 808-824: Add tests that exercise the runtime handler path for
message events instead of only the builder: write unit tests that invoke the
Channel implementation's event handling (the Event::Message path) for the
whatsapp_web channel created via make_channel() with with_transcription(),
covering allowlist resolution, voice-note filtering, missing-duration handling,
download/transcription failures, and the successful-forwarding path; use the
Channel trait methods (send/listen/health_check/typing semantics) or the
specific handler function used by the whatsapp_web implementation to feed
synthetic message events and assert expected outcomes (transcription forwarded
when enabled, ignored when disabled or filtered, proper error handling/logging
on download/transcription failure, and health/auth behavior).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d8483f8a-b790-466f-ba2b-eb31617395d5

📥 Commits

Reviewing files that changed from the base of the PR and between f7fefd4 and b2a880b.

📒 Files selected for processing (2)

src/channels/mod.rs
src/channels/whatsapp_web.rs

coderabbitai · 2026-03-09T21:04:45Z

src/channels/whatsapp_web.rs

+                                    tracing::warn!(
+                                        "WhatsApp Web: message from {} not in allowed list (candidates: {:?})",
+                                        sender_jid,
+                                        sender_candidates
+                                    );


⚠️ Potential issue | 🟠 Major

Remove raw phone candidates and transcript bodies from logs.

These new logs write sender identifiers and the full transcription text to normal warn/info logs. That is PII plus user message content, so one voice note now leaks directly into application logs.

As per coding guidelines, "Deny-by-default for access and exposure boundaries; never log secrets, raw tokens, or sensitive payloads; keep network/filesystem/shell scope as narrow as possible unless explicitly justified."

Also applies to: 443-447

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/channels/whatsapp_web.rs` around lines 377 - 381, The warn/info logs in the WhatsApp Web handler currently include PII and message content (e.g., logging sender_jid, sender_candidates and transcription text) — replace those raw values with redacted or categorical data before logging: update the tracing::warn!/info! calls in the block that references sender_jid and sender_candidates (and the similar calls around the section at lines handling transcription bodies) to log only non-sensitive indicators (e.g., "unknown-sender" or hash/boolean flags like sender_in_allowed_list and transcription_truncated=true) or counts, remove transcript body content entirely, and ensure any helper functions (e.g., the code that builds sender_candidates or the transcription logging path) perform the redaction so no raw PII or full message bodies are emitted to logs.

coderabbitai · 2026-03-09T21:04:45Z

src/channels/whatsapp_web.rs

+                            } else if let Some(ref audio) = msg.get_base_message().audio_message {
+                                // Voice note / audio message — try transcription
+                                let duration = audio.seconds.unwrap_or(0);
+                                tracing::info!(
+                                    "WhatsApp Web audio from {} in {} ({}s, ptt={})",
+                                    sender, chat, duration, audio.ptt.unwrap_or(false)
+                                );
+
+                                let config = match transcription_config.as_ref() {
+                                    Some(c) => c,
+                                    None => {
+                                        tracing::debug!(
+                                            "WhatsApp Web: transcription disabled, ignoring audio from {}",
+                                            normalized
+                                        );
+                                        return;
+                                    }
+                                };
+
+                                if u64::from(duration) > config.max_duration_secs {
+                                    tracing::info!(
+                                        "WhatsApp Web: skipping audio ({}s > {}s limit)",
+                                        duration, config.max_duration_secs
                                    );
                                    return;
                                }

-                                if let Err(e) = tx_inner
-                                    .send(ChannelMessage {
-                                        id: uuid::Uuid::new_v4().to_string(),
-                                        channel: "whatsapp".to_string(),
-                                        sender: normalized.clone(),
-                                        // Reply to the originating chat JID (DM or group).
-                                        reply_target: chat,
-                                        content: trimmed.to_string(),
-                                        timestamp: chrono::Utc::now().timestamp() as u64,
-                                        thread_ts: None,
-                                    })
-                                    .await
+                                let audio_data = match _client.download(audio.as_ref()).await {
+                                    Ok(d) => d,
+                                    Err(e) => {
+                                        tracing::warn!("WhatsApp Web: failed to download audio: {e}");
+                                        return;


⚠️ Potential issue | 🟠 Major

Gate this branch to actual voice notes and fail closed when duration is missing.

audio_message also covers regular audio attachments, so this currently sends any caption-less audio file through transcription. On top of that, Line 396 turns an unknown duration into 0, which bypasses the pre-download duration guard that src/channels/transcription.rs expects callers to enforce. That broadens the feature beyond the intended ptt=true voice-note case and can pull unknown-size media into memory before any size check runs.

🔧 Suggested fix

- } else if let Some(ref audio) = msg.get_base_message().audio_message { - // Voice note / audio message — try transcription - let duration = audio.seconds.unwrap_or(0); + } else if let Some(ref audio) = msg.get_base_message().audio_message { + if !audio.ptt.unwrap_or(false) { + tracing::debug!( + "WhatsApp Web: ignoring non-voice audio from {}", + normalized + ); + return; + } + + let Some(duration) = audio.seconds else { + tracing::warn!( + "WhatsApp Web: voice note duration missing; skipping download" + ); + return; + }; tracing::info!( "WhatsApp Web audio from {} in {} ({}s, ptt={})", sender, chat, duration, audio.ptt.unwrap_or(false) );

As per coding guidelines, "Prefer explicit `bail!`/errors for unsupported or unsafe states; never silently broaden permissions/capabilities; document fallback behavior when fallback is intentional and safe."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/channels/whatsapp_web.rs` around lines 394 - 425, The branch handling msg.get_base_message().audio_message should only process true voice notes and must fail closed when duration is missing: replace the current audio.seconds.unwrap_or(0)/audio.ptt.unwrap_or(false) behavior by first checking audio.ptt == Some(true) and returning/ignoring when not a PTT, then require audio.seconds to be Some(duration) and log/warn+return if missing (do not treat as 0), then compare that duration against transcription_config.max_duration_secs before calling _client.download; update the tracing messages accordingly and use early returns on unsupported states so callers never download unknown-size media.

rareba · 2026-03-12T14:19:43Z

Note: this PR needs a manual rebase due to structural conflicts — master has since rewritten the WhatsApp Web event handler with retry/reconnect architecture. The voice transcription additions need to be integrated with the new handler structure. Will rework and re-push.

rareba · 2026-03-12T14:31:12Z

Reworked from scratch — the previous version accidentally removed the retry/reconnect state machine. This new version starts from current master and adds voice transcription cleanly on top.

Changes (2 files, ~90 lines added):

src/channels/whatsapp_web.rs:
- TranscriptionConfig field + .with_transcription() builder
- Audio detection via msg.get_base_message().audio_message
- Duration limit enforcement before download
- Download via client.download(), MIME type mapping
- Transcription via existing transcription::transcribe_audio() subsystem
- Graceful error handling (skip on failure, log warnings)
- 2 new unit tests
- Full retry/reconnect/backoff/session-purge logic preserved
src/channels/mod.rs: wired .with_transcription() into WhatsApp Web channel creation

cargo fmt, cargo clippy -D warnings pass clean on Linux CI.

Ready for review — could a maintainer approve the CI workflow run and take a look? cc @rikitrader

Adds audio message detection and transcription to WhatsApp Web channel. Voice messages (PTT) are downloaded, transcribed via the existing transcription subsystem (Groq Whisper), and delivered as text content. - TranscriptionConfig field with builder pattern - Duration limit enforcement before download - MIME type mapping for audio formats - Graceful error handling (skip on failure) - Preserves full retry/reconnect state machine from master

rareba · 2026-03-15T17:40:16Z

Superseded: reopening from feat/whatsapp-web-media-support branch (corrected prefix per CONTRIBUTING.md).

rareba requested review from JordanTheJet, chumyin and theonlyhennygod as code owners March 6, 2026 12:17

willsarg changed the base branch from main to master March 7, 2026 18:29

rareba force-pushed the feature/whatsapp-web-media-support branch from c25d411 to b2a880b Compare March 9, 2026 20:55

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

This was referenced Mar 10, 2026

🦞 OpenClaw 生态日报 2026-03-10 duanyytop/agents-radar#119

Open

🦞 OpenClaw 生态日报 2026-03-10 gsscsd/big_model_radar#9

Open

rareba requested a review from SimianAstronaut7 as a code owner March 11, 2026 17:18

rareba force-pushed the feature/whatsapp-web-media-support branch from b25bbba to 505109b Compare March 12, 2026 14:30

rareba force-pushed the feature/whatsapp-web-media-support branch from 505109b to 81db398 Compare March 15, 2026 14:40

rareba closed this Mar 15, 2026

rareba mentioned this pull request Mar 15, 2026

feat(whatsapp-web): add voice message transcription support #3617

Merged

Conversation

rareba commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Label Snapshot (required)

Change Metadata

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Side Effects / Blast Radius (required)

Agent Collaboration Notes (recommended)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

.coderabbit.yaml has unrecognized properties

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

willsarg commented Mar 7, 2026

Uh oh!

rikitrader commented Mar 9, 2026

Uh oh!

rareba commented Mar 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

rareba commented Mar 12, 2026

Uh oh!

rareba commented Mar 12, 2026

Uh oh!

rareba commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rareba commented Mar 6, 2026 •

edited

Loading

Supersede Attribution (required when `Supersedes #` is used)

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties