Feature Request
Reasoning
Adding --stdin would allow the audio source and whisper_mic to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.
This sounds like a job better suited to plain whisper but it doesn't support continuous near-realtime interpretation which you need for mic input.
Implementation
Using ffmpeg to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdin
As an SSH command:
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdin
As a possible leed speech_recognition has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.
Extras
Example of using stdout for "keyboard" typing
Above commands can be piped into a read loop that types continuous output between each newline:
# X11
... | while IFS= read -r; do xdotool type -- "${REPLY} "; done
# Wayland
... | while IFS= read -r; do wtype -- "${REPLY} "; done
For that to work well, in cli.py you'd need to add import sys so you can add sys.stdout.flush() under every print(result), then in utils.py have logging go to stderr:
from rich.console import Console
rich_handler = RichHandler(level=logging.INFO, rich_tracebacks=True, markup=True, console=Console(stderr=True))
Standalone example of the ffmpeg for illustration purposes
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav
Feature Request
Reasoning
Adding
--stdinwould allow the audio source andwhisper_micto occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.This sounds like a job better suited to plain
whisperbut it doesn't support continuous near-realtime interpretation which you need for mic input.Implementation
Using
ffmpegto continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdinAs an SSH command:
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdinAs a possible leed
speech_recognitionhas built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.Extras
Example of using stdout for "keyboard" typing
Above commands can be piped into a read loop that types continuous output between each newline:
For that to work well, in cli.py you'd need to add
import sysso you can addsys.stdout.flush()under everyprint(result), then in utils.py have logging go tostderr:Standalone example of the ffmpeg for illustration purposes
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav