Skip to content

199-biotechnologies/textstream-asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextStream — Live Speech-to-Text on Apple Silicon

TextStream

Live speech-to-text streaming on Apple Silicon. One command. No API keys. No cloud.

Star this repo   Follow @longevityboris

PyPI version   MIT License   Python 3.10+   Apple Silicon   MLX Native


TextStream turns your Mac's microphone into a live transcription endpoint. It runs Qwen3-ASR on-device through MLX, filters noise with Silero VAD, and streams text over SSE at localhost:7890/stream. Any app, script, or frontend can subscribe and get words as they are spoken — with ~2% word error rate, no API keys, and zero cost.

Install · Quick Start · How It Works · API · Benchmarks · Configuration · Contributing


Why This Exists

Cloud speech APIs charge per minute and add network latency. Whisper runs offline but is not real-time. There is no simple way to get a local, streaming transcription endpoint that any process on your machine can read from.

TextStream fills that gap. One pip install, one command, and every app on your machine has access to a live transcript stream — for free.

Build voice-controlled tools. Add live captions to your app. Record meeting notes that write themselves. Pipe speech into your IDE. Whatever needs ears, point it at the stream.

Install

pip install textstream-asr

Requirements: macOS on Apple Silicon (M1/M2/M3/M4), Python 3.10+.

Quick Start

textstream                            # start transcribing, opens browser UI
textstream --no-browser               # headless — SSE server only
textstream --engine qwen-1.7b         # larger model, lower word error rate
textstream --vad-threshold 0.5        # stricter voice detection (default 0.4)

Connect from your app

Python:

import json, urllib.request

req = urllib.request.Request("http://localhost:7890/stream")
with urllib.request.urlopen(req) as resp:
    for line in resp:
        line = line.decode().strip()
        if line.startswith("data: "):
            event = json.loads(line[6:])
            if event["type"] == "stream":
                print(event["finalized"], event["draft"])

JavaScript:

const src = new EventSource("http://localhost:7890/stream");
src.onmessage = (e) => {
  const { finalized, draft } = JSON.parse(e.data);
  console.log(finalized, draft);
};

How It Works

Every --interval seconds (default 2.5), TextStream drains the mic buffer and runs Silero VAD on the chunk. If speech is detected, the chunk goes to Qwen3-ASR's streaming decoder. The model returns stable (finalized) text and speculative (draft) text. Stable text gets persisted to disk and broadcast to all SSE subscribers.

If the model hallucinates on noise that slips past VAD, a pattern filter catches it and resets the stream. With VAD active, this almost never fires.

Microphone → Audio Buffer → Silero VAD → Qwen3-ASR (MLX) → SSE Stream
                              ↓ (no speech)
                            Skip chunk

API

Endpoint Description
GET /stream SSE stream: {"type":"stream","finalized":"...","draft":"..."}
GET /engine Current engine info
GET /switch?engine=qwen-1.7b Hot-swap model without restart
GET /pause Pause mic capture
GET /resume Resume mic capture
GET /stop Shutdown server
GET / Built-in browser UI

Benchmarks

Accuracy (Word Error Rate)

Model LibriSpeech clean LibriSpeech other Params
Qwen3-ASR 0.6B (default) 2.11% 4.55% 600M
Qwen3-ASR 1.7B 1.63% 3.38% 1.7B
Whisper-large-v3 1.51% 3.97% 1.5B
GPT-4o-Transcribe 1.39% 3.75% --

Source: Qwen3-ASR Technical Report

Speed (Apple Silicon via MLX)

Metric Value
Real-time factor (RTF) ~0.06 (16x faster than real-time)
MLX vs PyTorch ~4x faster on Apple Silicon
VAD latency <1ms per 32ms audio chunk
Time to first token ~92ms

Source: mlx-qwen3-asr benchmarks, Silero VAD metrics

Resource Usage

  • RAM: ~1.2 GB for 0.6B model, ~3 GB for 1.7B
  • CPU/GPU: Runs on Neural Engine + GPU via MLX Metal backend. Minimal CPU overhead.
  • Disk: Models cached by HuggingFace Hub (~1.2 GB / 3.4 GB first download)
  • Battery: Comparable to background music playback. MLX is designed for Apple Silicon power efficiency.

Features

  • Real-time streaming ASR via Server-Sent Events at localhost:7890/stream
  • Qwen3-ASR on MLX — 2% WER, 16x faster than real-time on Apple Silicon
  • Silero VAD filters silence and noise before transcription runs
  • Hot-swap models between 0.6B and 1.7B without restarting the server
  • Built-in browser UI for quick visual monitoring
  • Hallucination filter catches and resets repetitive model output
  • Auto-saves transcripts to ~/Documents/textstream/transcripts/
  • Zero dependencies on cloud services — runs entirely on-device

Configuration

Flag Default Description
--port 7890 HTTP server port
--engine qwen qwen (0.6B) or qwen-1.7b
--interval 2.5 Seconds between transcription updates
--vad-threshold 0.4 Silero VAD speech probability threshold
--no-browser -- Do not open browser on start

Transcripts are saved to ~/Documents/textstream/transcripts/YYYY-MM-DD/.

Dependencies

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

MIT


Built by Boris Djordjevic at 199 Biotechnologies | Paperfoot AI

Star this repo   Follow @longevityboris