Skip to content

Commit cb4a89d

Browse files
Add streaming input functionality
1 parent 9c067c6 commit cb4a89d

File tree

14 files changed

+1120
-105
lines changed

14 files changed

+1120
-105
lines changed

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ Hit your <span style="color:#FF4500">**hotkey shortcut**</span> -> speak -> hotk
2525
| Feature | Notes |
2626
| -------------------------------- | ----------------------------------------------------------------------- |
2727
| **Whisper.cpp** backend | Local, offline, fast ASR. |
28-
| **Simulated typing** | instantly types straight into any currently focused input window. Even on Wayland! (*ydotool*). |
28+
| **Streaming transcription** | Real-time incremental typing as you speak. Text appears word-by-word, not after recording stops. |
29+
| **Simulated typing** | Instantly types straight into any currently focused input window. Even on Wayland! (*ydotool*). |
2930
| **Clipboard** | Auto-copies into clipboard - ready for pasting, if desired |
3031
| **Languages** | 99+ languages. Provides default language config and session language override |
3132
| **AIPP**, AI Post-Processing | AI-rewriting via local or cloud LLMs. GUI prompt editor. |
@@ -138,9 +139,25 @@ Leave VOXD running in the background -> go to any app where you want to voice-ty
138139
| Press hotkey … | VOXD does … |
139140
| ---------------- | ----------------------------------------------------------- |
140141
| **First press** | start recording |
141-
| **Second press** | stop ⇢ [transcribe ⇢ copy to clipboard] ⇢ types the output into any focused app |
142+
| **Second press** | stop ⇢ [finalize transcription ⇢ copy to clipboard] ⇢ types any remaining output into any focused app |
142143

143-
Otherwise, if in --flux (beta), **just speak**.
144+
### 🎙️ Streaming Mode (Default)
145+
146+
VOXD uses **streaming transcription** by default, which means:
147+
148+
- **Real-time typing**: Text appears incrementally as you speak, not after you stop recording
149+
- **Chunk-based processing**: Audio is processed in overlapping chunks (default: 3 seconds) for continuous transcription
150+
- **Incremental updates**: Text is typed word-by-word or phrase-by-phrase as it's transcribed (typically every 2 seconds or 3 words)
151+
- **Seamless experience**: You see your words appear in real-time, making it feel like natural voice-typing
152+
153+
**How it works:**
154+
1. Press hotkey to start → VOXD begins recording and transcribing
155+
2. As you speak → Text appears incrementally in your focused application
156+
3. Press hotkey again → Finalizes any remaining transcription and copies to clipboard
157+
158+
This streaming behavior is enabled by default in CLI (`voxd`), GUI (`voxd --gui`), and Tray (`voxd --tray`) modes. The old "record-then-transcribe" behavior is no longer used.
159+
160+
**Note:** If in `--flux` mode (beta), **just speak** - no hotkey needed, voice activity detection triggers recording automatically.
144161

145162
### Autostart
146163
For practical reasons (always ready to type & low system footprint), it is advised to enable voxd user daemon:
@@ -307,6 +324,15 @@ llamacpp_server_timeout: 30
307324
# Selected models per provider (automatically updated by VOXD)
308325
aipp_selected_models:
309326
llamacpp_server: "qwen2.5-3b-instruct-q4_k_m"
327+
328+
# Streaming transcription settings (default: enabled)
329+
streaming_enabled: true # Enable/disable streaming mode
330+
streaming_chunk_seconds: 3.0 # Audio chunk size in seconds (default: 3.0)
331+
streaming_overlap_seconds: 0.5 # Overlap between chunks in seconds (default: 0.5)
332+
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates (default: 2.0)
333+
streaming_emit_word_count: 3 # Minimum words before emitting text (default: 3)
334+
streaming_typing_delay: 0.01 # Delay between typed characters in streaming mode (default: 0.01)
335+
streaming_min_chars_to_type: 3 # Minimum characters before typing incremental text (default: 3)
310336
```
311337
312338
---

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "voxd"
7-
version = "mr.batman" # bump manually on releases
7+
version = "1.7.0"
88
description = "Voice-typing helper powered by whisper.cpp"
99
authors = [{ name = "Jakov", email = "jakov.iv@proton.me" }]
1010
requires-python = ">=3.9"

src/voxd/__main__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,12 @@ def main():
352352
dest="lang",
353353
help="Transcription language (ISO 639-1, e.g. 'en', 'sv', or 'auto' for detection)"
354354
)
355+
parser.add_argument(
356+
"-v",
357+
"--verbose",
358+
action="store_true",
359+
help="Enable verbose logging (shows detailed debug output)"
360+
)
355361
args, unknown = parser.parse_known_args()
356362

357363
if args.version:
@@ -419,6 +425,13 @@ def main():
419425
sys.exit(0)
420426

421427
cfg = AppConfig()
428+
# Session-only override for verbosity
429+
if args.verbose:
430+
cfg.data["verbosity"] = True
431+
setattr(cfg, "verbosity", True)
432+
import os
433+
os.environ["VOXD_VERBOSE"] = "1"
434+
422435
# Session-only override for language
423436
if args.lang:
424437
try:

0 commit comments

Comments
 (0)