feat(telegram): add local voice transcription via whisper.cpp by codefather-labs · Pull Request #1350 · anthropics/claude-plugins-official

codefather-labs · 2026-04-10T21:16:22Z

Summary

Adds automatic local speech-to-text transcription for Telegram voice messages using whisper.cpp. When a voice
message arrives, it is transcribed locally before being forwarded to Claude — no external API calls, no data leaves the machine.

Motivation

Voice messages are one of the most natural ways to communicate on Telegram, but Claude currently receives them as opaque (voice message) placeholders with
no content. This forces users to either type everything out or manually transcribe their own voice notes.

With this change, Claude receives the full text of every voice message, making voice a first-class input method for the Telegram channel.

How it works

Gate check — unauthenticated messages are dropped before any processing
Download — OGA audio is fetched from Telegram's Bot API
Convert — ffmpeg converts OGA → WAV (16 kHz mono PCM, as required by whisper)
Transcribe — whisper-cli runs with the medium model, auto-detecting language
Deliver — transcribed text is forwarded as [voice transcription] <text>
Cleanup — temporary OGA and WAV files are deleted

A typing indicator is shown in Telegram while transcription is in progress.

Auto-install

Dependencies are automatically installed on first voice message via the detected package manager:

Platform	Package managers	Installed
macOS	brew	`whisper-cpp`, `ffmpeg`
Linux	apt-get, dnf, pacman	`whisper-cpp`, `ffmpeg`
Windows	winget, choco, scoop	`whisper-cpp`, `ffmpeg`

The whisper medium model (ggml-medium.bin, ~1.5 GB) is downloaded from HuggingFace automatically.

Graceful degradation

If whisper-cli, ffmpeg, or the model are unavailable, the plugin falls back to the existing (voice message) behavior. Zero breakage for users who don't
need or want voice transcription.

Configuration

All paths are configurable via environment variables in ~/.claude/channels/telegram/.env:

Variable	Default
`WHISPER_CLI_PATH`	auto-detected
`FFMPEG_PATH`	auto-detected
`WHISPER_MODEL_PATH`	`~/.local/share/whisper-cpp/models/ggml-medium.bin`
`WHISPER_MODEL_NAME`	`ggml-medium.bin`
`WHISPER_MODEL_URL`	HuggingFace CDN

Changes

external_plugins/telegram/server.ts — added transcribeVoice(), cross-platform ensureWhisper() with auto-install, modified voice message handler
(+308 lines, -3 lines)

Test plan

Voice message with whisper installed → transcribed text delivered to Claude
Tested on macOS with Apple Silicon (M1) — whisper medium model, ~4s for a 3s message
Voice message without whisper installed → graceful fallback to (voice message)
Voice message from non-allowlisted user → dropped before transcription (no CPU wasted)
Voice message with caption → caption used as-is, no transcription triggered
Temp files (OGA/WAV) cleaned up after transcription
Linux (apt-get) auto-install
Windows (winget) auto-install

🤖 Generated with Claude Code

Voice messages are now automatically transcribed using whisper.cpp before being forwarded to Claude. The transcribed text is prefixed with [voice transcription] so Claude knows it's machine-generated. Key features: - Cross-platform: auto-detects OS and package manager (brew, apt, dnf, pacman, winget, choco, scoop) - Auto-installs whisper-cpp and ffmpeg if missing - Downloads ggml-medium model (~1.5 GB) from HuggingFace on first use - Graceful degradation: falls back to "(voice message)" if dependencies are unavailable - Configurable via env vars: WHISPER_CLI_PATH, FFMPEG_PATH, WHISPER_MODEL_PATH, WHISPER_MODEL_NAME, WHISPER_MODEL_URL - Gate check before transcription to avoid wasting CPU on unauthenticated messages - Typing indicator shown during transcription - Temp files cleaned up after each transcription Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-10T21:16:32Z

Thanks for your interest! This repo only accepts contributions from Anthropic team members. If you'd like to submit a plugin to the marketplace, please submit your plugin here.

codefather-labs and others added 2 commits April 11, 2026 00:00

chore(telegram): bump version to 0.1.0 for whisper voice transcription

f6d352f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): add local voice transcription via whisper.cpp#1350

feat(telegram): add local voice transcription via whisper.cpp#1350
codefather-labs wants to merge 2 commits intoanthropics:mainfrom
codefather-labs:main

codefather-labs commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codefather-labs commented Apr 10, 2026

Summary

Motivation

How it works

Auto-install

Graceful degradation

Configuration

Changes

Test plan

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant