Skip to content

feat(telegram): add local voice transcription via whisper.cpp#1350

Closed
codefather-labs wants to merge 2 commits intoanthropics:mainfrom
codefather-labs:main
Closed

feat(telegram): add local voice transcription via whisper.cpp#1350
codefather-labs wants to merge 2 commits intoanthropics:mainfrom
codefather-labs:main

Conversation

@codefather-labs
Copy link
Copy Markdown

Summary

Adds automatic local speech-to-text transcription for Telegram voice messages using whisper.cpp. When a voice
message arrives, it is transcribed locally before being forwarded to Claude — no external API calls, no data leaves the machine.

Motivation

Voice messages are one of the most natural ways to communicate on Telegram, but Claude currently receives them as opaque (voice message) placeholders with
no content. This forces users to either type everything out or manually transcribe their own voice notes.

With this change, Claude receives the full text of every voice message, making voice a first-class input method for the Telegram channel.

How it works

  1. Gate check — unauthenticated messages are dropped before any processing
  2. Download — OGA audio is fetched from Telegram's Bot API
  3. Convert — ffmpeg converts OGA → WAV (16 kHz mono PCM, as required by whisper)
  4. Transcribe — whisper-cli runs with the medium model, auto-detecting language
  5. Deliver — transcribed text is forwarded as [voice transcription] <text>
  6. Cleanup — temporary OGA and WAV files are deleted

A typing indicator is shown in Telegram while transcription is in progress.

Auto-install

Dependencies are automatically installed on first voice message via the detected package manager:

Platform Package managers Installed
macOS brew whisper-cpp, ffmpeg
Linux apt-get, dnf, pacman whisper-cpp, ffmpeg
Windows winget, choco, scoop whisper-cpp, ffmpeg

The whisper medium model (ggml-medium.bin, ~1.5 GB) is downloaded from HuggingFace automatically.

Graceful degradation

If whisper-cli, ffmpeg, or the model are unavailable, the plugin falls back to the existing (voice message) behavior. Zero breakage for users who don't
need or want voice transcription.

Configuration

All paths are configurable via environment variables in ~/.claude/channels/telegram/.env:

Variable Default
WHISPER_CLI_PATH auto-detected
FFMPEG_PATH auto-detected
WHISPER_MODEL_PATH ~/.local/share/whisper-cpp/models/ggml-medium.bin
WHISPER_MODEL_NAME ggml-medium.bin
WHISPER_MODEL_URL HuggingFace CDN

Changes

  • external_plugins/telegram/server.ts — added transcribeVoice(), cross-platform ensureWhisper() with auto-install, modified voice message handler
    (+308 lines, -3 lines)

Test plan

  • Voice message with whisper installed → transcribed text delivered to Claude
  • Tested on macOS with Apple Silicon (M1) — whisper medium model, ~4s for a 3s message
  • Voice message without whisper installed → graceful fallback to (voice message)
  • Voice message from non-allowlisted user → dropped before transcription (no CPU wasted)
  • Voice message with caption → caption used as-is, no transcription triggered
  • Temp files (OGA/WAV) cleaned up after transcription
  • Linux (apt-get) auto-install
  • Windows (winget) auto-install

🤖 Generated with Claude Code

codefather-labs and others added 2 commits April 11, 2026 00:00
Voice messages are now automatically transcribed using whisper.cpp
before being forwarded to Claude. The transcribed text is prefixed
with [voice transcription] so Claude knows it's machine-generated.

Key features:
- Cross-platform: auto-detects OS and package manager (brew, apt,
  dnf, pacman, winget, choco, scoop)
- Auto-installs whisper-cpp and ffmpeg if missing
- Downloads ggml-medium model (~1.5 GB) from HuggingFace on first use
- Graceful degradation: falls back to "(voice message)" if
  dependencies are unavailable
- Configurable via env vars: WHISPER_CLI_PATH, FFMPEG_PATH,
  WHISPER_MODEL_PATH, WHISPER_MODEL_NAME, WHISPER_MODEL_URL
- Gate check before transcription to avoid wasting CPU on
  unauthenticated messages
- Typing indicator shown during transcription
- Temp files cleaned up after each transcription

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Thanks for your interest! This repo only accepts contributions from Anthropic team members. If you'd like to submit a plugin to the marketplace, please submit your plugin here.

@github-actions github-actions bot closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant