TextStream

Live speech-to-text streaming on Apple Silicon. One command. No API keys. No cloud.

TextStream turns your Mac's microphone into a live transcription endpoint. It runs Qwen3-ASR on-device through MLX, filters noise with Silero VAD, and streams text over SSE at localhost:7890/stream. Any app, script, or frontend can subscribe and get words as they are spoken — with ~2% word error rate, no API keys, and zero cost.

Install · Quick Start · How It Works · API · Benchmarks · Configuration · Contributing

Why This Exists

Cloud speech APIs charge per minute and add network latency. Whisper runs offline but is not real-time. There is no simple way to get a local, streaming transcription endpoint that any process on your machine can read from.

TextStream fills that gap. One pip install, one command, and every app on your machine has access to a live transcript stream — for free.

Build voice-controlled tools. Add live captions to your app. Record meeting notes that write themselves. Pipe speech into your IDE. Whatever needs ears, point it at the stream.

Install

pip install textstream-asr

Requirements: macOS on Apple Silicon (M1/M2/M3/M4), Python 3.10+.

Quick Start

textstream                            # start transcribing, opens browser UI
textstream --no-browser               # headless — SSE server only
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)

Connect from your app

Python:

import json, urllib.request

req = urllib.request.Request("http://localhost:7890/stream")
with urllib.request.urlopen(req) as resp:
    for line in resp:
        line = line.decode().strip()
        if line.startswith("data: "):
            event = json.loads(line[6:])
            if event["type"] == "stream":
                print(event["finalized"], event["draft"])

JavaScript:

const src = new EventSource("http://localhost:7890/stream");
src.onmessage = (e) => {
  const { finalized, draft } = JSON.parse(e.data);
  console.log(finalized, draft);
};

How It Works

Every --interval seconds (default 2.5), TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected, the chunk goes to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text gets persisted to disk and broadcast to all SSE subscribers.

If the model hallucinates on noise that slips past VAD, a pattern filter catches it and resets the stream. With VAD active, this almost never fires.

Microphone → Audio Buffer → Silero VAD → Qwen3-ASR (MLX) → SSE Stream
                              ↓ (no speech)
                            Skip chunk

API

Endpoint	Description
`GET /stream`	SSE stream: `{"type":"stream","finalized":"...","draft":"..."}`
`GET /engine`	Current engine info
`GET /switch?engine=qwen-1.7b`	Hot-swap model without restart
`GET /pause`	Pause mic capture
`GET /resume`	Resume mic capture
`GET /stop`	Shutdown server
`GET /`	Built-in browser UI

Benchmarks

Accuracy (Word Error Rate)

Model	LibriSpeech clean	LibriSpeech other	Params
Qwen3-ASR 0.6B (default)	2.11%	4.55%	600M
Qwen3-ASR 1.7B	1.63%	3.38%	1.7B
Whisper-large-v3	1.51%	3.97%	1.5B
GPT-4o-Transcribe	1.39%	3.75%	--

Source: Qwen3-ASR Technical Report

Speed (Apple Silicon via MLX)

Metric	Value
Real-time factor (RTF)	~0.06 (16x faster than real-time)
MLX vs PyTorch	~4x faster on Apple Silicon
VAD latency	<1ms per 32ms audio chunk
Time to first token	~92ms

Source: mlx-qwen3-asr benchmarks, Silero VAD metrics

Resource Usage

RAM: ~1.2 GB for 0.6B model, ~3 GB for 1.7B
CPU/GPU: Runs on Neural Engine + GPU via MLX Metal backend. Minimal CPU overhead.
Disk: Models cached by HuggingFace Hub (~1.2 GB / 3.4 GB first download)
Battery: Comparable to background music playback. MLX is designed for Apple Silicon power efficiency.

Features

Real-time streaming ASR via Server-Sent Events at localhost:7890/stream
Qwen3-ASR on MLX — 2% WER, 16x faster than real-time on Apple Silicon
Silero VAD filters silence and noise before transcription runs
Hot-swap models between 0.6B and 1.7B without restarting the server
Built-in browser UI for quick visual monitoring
Hallucination filter catches and resets repetitive model output
Auto-saves transcripts to ~/Documents/textstream/transcripts/
Zero dependencies on cloud services — runs entirely on-device

Configuration

Flag	Default	Description
`--port`	`7890`	HTTP server port
`--engine`	`qwen`	`qwen` (0.6B) or `qwen-1.7b`
`--interval`	`2.5`	Seconds between transcription updates
`--vad-threshold`	`0.4`	Silero VAD speech probability threshold
`--no-browser`	--	Do not open browser on start

Transcripts are saved to ~/Documents/textstream/transcripts/YYYY-MM-DD/.

Dependencies

MLX — Apple's ML framework for Apple Silicon
mlx-qwen3-asr — Qwen3-ASR ported to MLX
silero-vad-lite — Voice activity detection (~2 MB, bundles ONNX runtime)
sounddevice — PortAudio bindings for mic capture
NumPy

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

MIT

Built by Boris Djordjevic at 199 Biotechnologies | Paperfoot AI

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src/textstream		src/textstream
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextStream

Why This Exists

Install

Quick Start

Connect from your app

How It Works

API

Benchmarks

Accuracy (Word Error Rate)

Speed (Apple Silicon via MLX)

Resource Usage

Features

Configuration

Dependencies

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TextStream

Why This Exists

Install

Quick Start

Connect from your app

How It Works

API

Benchmarks

Accuracy (Word Error Rate)

Speed (Apple Silicon via MLX)

Resource Usage

Features

Configuration

Dependencies

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages