Scribble

Scribble is a fast, lightweight transcription engine written in Rust, with a built-in Whisper backend and a backend trait for custom implementations.

Scribble will demux/decode audio or video containers (MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), downmix to mono, and resample to 16 kHz — no preprocessing required.

Demo

Project goals

Provide a clean, idiomatic Rust API for audio transcription
Support multiple output formats (JSON, VTT, plain text, etc.)
Work equally well as a CLI tool or embedded library
Be streaming-first: designed to support incremental, chunk-based transcription pipelines (live audio, long-running streams, and low-latency workflows)
Enable composable pipelines: VAD → transcription → encoding, with clear extension points for streaming and real-time use cases
Keep the core simple, explicit, and easy to extend

Scribble is built with streaming and real-time transcription in mind, even when operating on static files today.

Installation

Rust toolchain

Scribble targets Rust stable (tracked via rust-toolchain.toml).

Clone the repository and build the binaries:

./scripts/build-all.sh

Or build a single binary to a target directory:

./scripts/build.sh scribble-cli ./dist

This will produce the following binaries:

scribble-cli — transcribe audio/video (decodes + normalizes to mono 16 kHz)
scribble-server — HTTP server for transcription
model-downloader — download Whisper and VAD models

GPU acceleration (feature flags)

Scribble exposes whisper-rs GPU backend features as Cargo features. Enable the backend you want in Cargo.toml or via --features:

[dependencies]
scribble = { version = "0.5", features = ["cuda"] }

cargo run --features "bin-scribble-cli,cuda" --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --input ./input.wav

Available GPU feature flags:

cuda (NVIDIA CUDA)
metal (Apple Metal)
hipblas (AMD ROCm)
vulkan (Vulkan)
coreml (Apple CoreML)

These are passthrough features; you still need the corresponding system dependencies installed for your platform. See the whisper-rs documentation for backend setup details.

model-downloader

model-downloader is a small helper CLI for downloading known-good Whisper and Whisper-VAD models into a local directory.

List available models

cargo run --features bin-model-downloader --bin model-downloader -- --list

Example output:

Whisper models:
  - tiny
  - base.en
  - large-v3-turbo
  - large-v3-turbo-q8_0
  ...

VAD models:
  - silero-v5.1.2
  - silero-v6.2.0

Download a model

cargo run --features bin-model-downloader --bin model-downloader -- --name large-v3-turbo

By default, models are downloaded into ./models.

Download into a custom directory

cargo run --features bin-model-downloader --bin model-downloader -- \
  --name silero-v6.2.0 \
  --dir /opt/scribble/models

Downloads are performed safely:

written to *.part
fsynced
atomically renamed into place

scribble-cli

scribble-cli is the main transcription CLI.

It accepts audio or video containers and normalizes them to Whisper’s required mono 16 kHz internally. Provide:

an input media path (e.g. MP4, MP3, WAV, FLAC, OGG, WebM, MKV) or - to stream from stdin
a Whisper model
a Whisper-VAD model (used when --enable-vad is set)

Basic transcription (VTT output)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.mp4

Output is written to stdout in WebVTT format by default.

Stream a live URL into `scribble-cli` (via `ffmpeg`)

If you have a live audio stream URL (MP3/AAC/etc.), you can decode it to Whisper-friendly WAV and pipe it into scribble-cli via stdin:

ffmpeg -re -loglevel error -nostats \
  -i "https://stream.example.com/live.mp3?session-id=REDACTED" \
  -f wav -ac 1 -ar 16000 - \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

Stream a Twitch channel into `scribble-cli` (via `streamlink` + `ffmpeg`)

If you have streamlink installed, you can pull a Twitch stream to stdout and feed it through ffmpeg:

streamlink --stdout https://www.twitch.tv/dougdoug best \
| ffmpeg -hide_banner -loglevel error -i pipe:0 -vn -ac 1 -ar 16000 -f wav pipe:1 \
| scribble-cli \
    --model ./models/ggml-tiny.bin \
    --vad-model ./models/ggml-silero-v6.2.0.bin \
    --enable-vad \
    --input -

scribble-server

scribble-server is a long-running HTTP server that loads models once and accepts transcription requests over HTTP.

Start the server

cargo run --features bin-scribble-server --bin scribble-server -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --host 127.0.0.1 \
  --port 8080

Transcribe via HTTP (multipart upload)

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=vtt" \
  > transcript.vtt

For JSON output:

curl -sS --data-binary @./input.wav \
  "http://127.0.0.1:8080/transcribe?output=json" \
  > transcript.json

Example using all query params:

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=json&output_type=json&model_key=ggml-large-v3-turbo.bin&enable_vad=true&translate_to_english=true&language=en" \
  > transcript.json

Prometheus metrics

scribble-server exposes Prometheus metrics at GET /metrics.

curl -sS "http://127.0.0.1:8080/metrics"

Key metrics:

scribble_http_requests_total (labels: status)
scribble_http_request_duration_seconds (labels: status)
scribble_http_in_flight_requests

Logging

All binaries emit structured JSON logs to stderr.

Default level: error
Override with SCRIBBLE_LOG (e.g. SCRIBBLE_LOG=info)

JSON output

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type json

Enable voice activity detection (VAD)

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --enable-vad \
  --input ./input.wav

When VAD is enabled:

non-speech regions are suppressed
if no speech is detected, no output is produced

Specify language explicitly

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --language en

If --language is omitted, Whisper will auto-detect.

Write output to a file

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.wav \
  --output-type vtt \
  > transcript.vtt

Library usage

Scribble is also designed to be embedded as a library.

High-level usage looks like:

use scribble::{Opts, OutputType, Scribble};
use std::fs::File;

let mut scribble = Scribble::new(
    ["./models/ggml-large-v3-turbo.bin"],
    "./models/ggml-silero-v6.2.0.bin",
)?;

let mut input = File::open("audio.wav")?;
let mut output = Vec::new();

let opts = Opts {
    model_key: None,
    enable_translate_to_english: false,
    enable_voice_activity_detection: true,
    language: None,
    output_type: OutputType::Json,
    incremental_min_window_seconds: 1,
};

scribble.transcribe(&mut input, &mut output, &opts)?;

let json = String::from_utf8(output)?;
println!("{json}");

Goals

Make VAD streaming-capable
Support streaming and incremental transcription
Select the primary audio track in multi-track video containers
Implement a web server
Add Prometheus metrics endpoint
Add structured logs (tracing)
Expand test coverage to 80%+

Coverage

This project uses cargo-llvm-cov for coverage locally and in CI.

One-time setup:

rustup component add llvm-tools-preview
cargo install cargo-llvm-cov

Run coverage locally:

# Print a summary to stdout
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets

# Generate an HTML report (writes to ./target/llvm-cov/html)
cargo llvm-cov --features bin-scribble-cli,bin-model-downloader,bin-scribble-server --all-targets --html

Status

Scribble is under active development. The API is not yet stable, but the foundations are in place and evolving quickly.

Release notes live in CHANGELOG.md (and GitHub Releases).

Contributing

See STYLEGUIDE.md for code style, verification conventions, and repo-level checklists.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
demo		demo
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
STYLEGUIDE.md		STYLEGUIDE.md
banner.png		banner.png
footer.png		footer.png
rust-toolchain.toml		rust-toolchain.toml

Folders and files

Latest commit

History

Repository files navigation

Scribble

Demo

Project goals

Installation

Rust toolchain

GPU acceleration (feature flags)

model-downloader

List available models

Download a model

Download into a custom directory

scribble-cli

Basic transcription (VTT output)

Stream a live URL into scribble-cli (via ffmpeg)

Stream a Twitch channel into scribble-cli (via streamlink + ffmpeg)

scribble-server

Start the server

Transcribe via HTTP (multipart upload)

Prometheus metrics

Logging

JSON output

Enable voice activity detection (VAD)

Specify language explicitly

Write output to a file

Library usage

Goals

Coverage

Status

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stream a live URL into `scribble-cli` (via `ffmpeg`)

Stream a Twitch channel into `scribble-cli` (via `streamlink` + `ffmpeg`)

Packages