Skip to content

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

@ulfnic

Description

@ulfnic

Feature Request

Reasoning

Adding --stdin would allow the audio source and whisper_mic to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.

This sounds like a job better suited to plain whisper but it doesn't support continuous near-realtime interpretation which you need for mic input.

Implementation

Using ffmpeg to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdin

As an SSH command:

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdin

As a possible leed speech_recognition has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.

Extras

Example of using stdout for "keyboard" typing

Above commands can be piped into a read loop that types continuous output between each newline:

# X11
... | while IFS= read -r; do xdotool type -- "${REPLY} "; done

# Wayland
... | while IFS= read -r; do wtype -- "${REPLY} "; done

For that to work well, in cli.py you'd need to add import sys so you can add sys.stdout.flush() under every print(result), then in utils.py have logging go to stderr:

from rich.console import Console
rich_handler = RichHandler(level=logging.INFO, rich_tracebacks=True, markup=True, console=Console(stderr=True))

Standalone example of the ffmpeg for illustration purposes

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions