Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.
Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.
- Provide a clean, idiomatic Rust API for audio transcription
- Support multiple output formats (JSON, VTT, plain text, etc.)
- Work equally well as a CLI tool or embedded library
- Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
- Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
- Keep the core simple, explicit, and easy to extend
Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.
Scribble targets Rust stable (tracked via rust-toolchain.toml).
Clone the repository and build the binaries:
./scripts/build-all.shOr build a single binary to a target directory:
./scripts/build.sh scribble-cli ./distThis will produce the following binaries:
scribble-cli— transcribe audio/video (decodes + normalizes to mono 16 kHz)scribble-server— HTTP server for transcriptionmodel-downloader— download Whisper and VAD models
Scribble exposes whisper-rs GPU backend features as Cargo features. Enable the
backend you want in Cargo.toml or via --features:
[dependencies]
scribble = { version = "0.5", features = ["cuda"] }cargo run --features "bin-scribble-cli,cuda" --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--input ./input.wavAvailable GPU feature flags:
cuda(NVIDIA CUDA)metal(Apple Metal)hipblas(AMD ROCm)vulkan(Vulkan)coreml(Apple CoreML)
These are passthrough features; you still need the corresponding system dependencies installed for your platform. See the whisper-rs documentation for backend setup details.
model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.
cargo run --features bin-model-downloader --bin model-downloader -- --listExample output:
Whisper models:
- tiny
- base.en
- large-v3-turbo
- large-v3-turbo-q8_0
...
VAD models:
- silero-v5.1.2
- silero-v6.2.0
cargo run --features bin-model-downloader --bin model-downloader -- --name large-v3-turboBy default, models are downloaded into ./models.
cargo run --features bin-model-downloader --bin model-downloader -- \
--name silero-v6.2.0 \
--dir /opt/scribble/modelsDownloads are performed safely:
- written to
*.part - fsynced
- atomically renamed into place
scribble-cli is the main transcription CLI.
It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:
- an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or
-to stream from stdin - a Whisper model
- a Whisper-VAD model (used when
--enable-vadis set)
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.mp4Output is written to stdout in WebVTT format by default.
If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:
ffmpeg -re -loglevel error -nostats \
-i "https://stream.example.com/live.mp3?session-id=REDACTED" \
-f wav -ac 1 -ar 16000 - \
| scribble-cli \
--model ./models/ggml-tiny.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input -If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:
streamlink --stdout https://www.twitch.tv/dougdoug best \
| ffmpeg -hide_banner -loglevel error -i pipe:0 -vn -ac 1 -ar 16000 -f wav pipe:1 \
| scribble-cli \
--model ./models/ggml-tiny.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input -scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.
cargo run --features bin-scribble-server --bin scribble-server -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--host 127.0.0.1 \
--port 8080curl -sS --data-binary @./input.mp4 \
"http://127.0.0.1:8080/transcribe?output=vtt" \
> transcript.vttFor JSON output:
curl -sS --data-binary @./input.wav \
"http://127.0.0.1:8080/transcribe?output=json" \
> transcript.jsonExample using all query params:
curl -sS --data-binary @./input.mp4 \
"http://127.0.0.1:8080/transcribe?output=json&output_type=json&model_key=ggml-large-v3-turbo.bin&enable_vad=true&translate_to_english=true&language=en" \
> transcript.jsonscribble-server exposes Prometheus metrics at GET /metrics.
curl -sS "http://127.0.0.1:8080/metrics"Key metrics:
scribble_http_requests_total(labels:status)scribble_http_request_duration_seconds(labels:status)scribble_http_in_flight_requests
All binaries emit structured JSON logs to stderr.
- Default level:
error - Override with
SCRIBBLE_LOG(e.g.SCRIBBLE_LOG=info)
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--output-type jsoncargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--enable-vad \
--input ./input.wavWhen VAD is enabled:
- non-speech regions are suppressed
- if no speech is detected, no output is produced
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--language enIf --language is omitted, Whisper will auto-detect.
cargo run --features bin-scribble-cli --bin scribble-cli -- \
--model ./models/ggml-large-v3-turbo.bin \
--vad-model ./models/ggml-silero-v6.2.0.bin \
--input ./input.wav \
--output-type vtt \
> transcript.vttScribble is also designed to be embedded as a library.
High-level usage looks like:
use scribble::{Opts, OutputType, Scribble};
use std::fs::File;
let mut scribble = Scribble::new(
["./models/ggml-large-v3-turbo.bin"],
"./models/ggml-silero-v6.2.0.bin",
)?;
let mut input = File::open("audio.wav")?;
let mut output = Vec::new();
let opts = Opts {
model_key: None,
enable_translate_to_english: false,
enable_voice_activity_detection: true,
language: None,
output_type: OutputType::Json,
incremental_min_window_seconds: 1,
};
scribble.transcribe(&mut input, &mut output, &opts)?;
let json = String::from_utf8(output)?;
println!("{json}");- Make VAD streaming-capable
- Support streaming and incremental transcription
- Select the primary audio track in multi-track video containers
- Implement a web server
- Add Prometheus metrics endpoint
- Add structured logs (tracing)
- Expand test coverage to 80%+
This project uses cargo-llvm-cov for coverage locally and in CI.
One-time setup:
rustup component add llvm-tools-preview
cargo install cargo-llvm-covRun coverage locally:
# Print a summary to stdout
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets
# Generate an HTML report (writes to ./target/llvm-cov/html)
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets --htmlScribble is under active development. The API is not yet stable, but the foundations are in place and evolving quickly.
Release notes live in CHANGELOG.md (and GitHub Releases).
See STYLEGUIDE.md for code style, verification conventions, and repo-level checklists.
MIT


