You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+66-16Lines changed: 66 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ VoxInput is meant to be used with [LocalAI](https://localai.io), but it will fun
11
11
## Features
12
12
13
13
-**Speech-to-Text Daemon**: Runs as a background process to listen for signals to start or stop recording audio.
14
-
-**Audio Capture and Playback**: Records audio from the microphone and plays it back for verification.
14
+
-**Audio Capture**: Records audio from the microphone or any other device, including audio you are listening to.
15
15
-**Transcription**: Converts recorded audio into text using a local or remote transcription service.
16
16
-**Text Automation**: Simulates typing the transcribed text into an application using [`dotool`](https://git.sr.ht/~geb/dotool).
17
17
-**Voice Activity Detection**: In realtime mode VoxInput uses VAD to detect speech segments and automatically transcribe them.
@@ -24,11 +24,15 @@ VoxInput is meant to be used with [LocalAI](https://localai.io), but it will fun
24
24
-`dotool` (for simulating keyboard input)
25
25
-`OPENAI_API_KEY` or `VOXINPUT_API_KEY`: Your OpenAI API key for Whisper transcription. If you have a local instance with no key, then just leave it unset.
26
26
-`OPENAI_BASE_URL` or `VOXINPUT_BASE_URL`: The base URL of the OpenAI compatible API server: defaults to `http://localhost:8080/v1`
27
-
-`OPENAI_WS_BASE_URL` or `VOXINPUT_WS_BASE_URL`: The base URL of the realtime websocket API: defaults to `ws://localhost:8080/v1/realtime`
28
-
- OpenAI Realtime API support - VoxInput's realtime mode with VAD requires a [websocket endpoint that support's OpenAI's realtime API in transcription only mode](https://github.com/mudler/LocalAI/pull/5392). You can disable realtime mode with `--no-realtime`.
29
-
30
-
Note that the VoxInput env vars take precedence over the OpenAI ones.
31
-
27
+
-`XDG_RUNTIME_DIR`: Required for PID and state files in `$XDG_RUNTIME_DIR`.
28
+
-`VOXINPUT_LANG`: Language code for transcription (defaults to empty).
29
+
-`VOXINPUT_TRANSCRIPTION_MODEL`: Transcription model (default: `whisper-1`).
-`VOXINPUT_SHOW_STATUS`: Show GUI notifications (`yes`/`no`, default: `yes`).
32
+
-`VOXINPUT_CAPTURE_DEVICE`: Specific audio capture device name (run `voxinput devices` to list).
33
+
-`VOXINPUT_OUTPUT_FILE`: Path to save the transcribed text to a file instead of typing it with dotool.
34
+
35
+
**Note**: `VOXINPUT_` vars take precedence over `OPENAI_` vars.
32
36
Unless you don't mind running VoxInput as root, then you also need to ensure the following is setup for `dotool`
33
37
34
38
- Your user is in the `input` user group
@@ -78,36 +82,50 @@ The pop-up window showing when recording has begun can be disabled by setting `V
78
82
79
83
### Commands
80
84
81
-
-**`listen`**: Starts the speech-to-text daemon.
85
+
-**`listen`**: Start speech to text daemon.
86
+
-`--replay`: Play the audio just recorded for transcription (non-realtime mode only).
87
+
-`--no-realtime`: Use the HTTP API instead of the realtime API; disables VAD.
88
+
-`--no-show-status`: Don't show when recording has started or stopped.
89
+
-`--output-file <path>`: Save transcript to file instead of typing.
82
90
```bash
83
91
./voxinput listen
84
92
```
85
93
86
-
-**`record`**: Sends a signal to the daemon to start recording audio then exits. In realtime mode this will start transcription.
94
+
-**`record`**: Tell existing listener to start recording audio. In realtime mode it also begins transcription.
87
95
```bash
88
96
./voxinput record
89
97
```
90
98
91
-
-**`write`** or **`stop`**: Sends a signal to the daemon to stop recording. When not in realtime mode this triggers transcription.
99
+
-**`write`** or **`stop`**: Tell existing listener to stop recording audio and begin transcription if not in realtime mode. `stop` alias makes more sense in realtime mode.
92
100
```bash
93
101
./voxinput write
94
102
```
95
103
96
-
-**`toggle`**: Toggle recording on/off (start recording if idle, stop if recording). Only works in realtime mode.
104
+
-**`toggle`**: Toggle recording on/off (start recording if idle, stop if recording).
97
105
```bash
98
106
./voxinput toggle
99
107
```
100
108
101
-
-**`status`**: Show whether the server is listening and if it's currently recording. Only works in realtime mode.
109
+
-**`status`**: Show whether the server is listening and if it's currently recording.
102
110
```bash
103
111
./voxinput status
104
112
```
105
113
106
-
-**`help`**: Displays help information.
114
+
-**`devices`**: List capture devices.
115
+
```bash
116
+
./voxinput devices
117
+
```
118
+
119
+
-**`help`**: Show help message.
107
120
```bash
108
121
./voxinput help
109
122
```
110
123
124
+
-**`ver`**: Print version.
125
+
```bash
126
+
./voxinput ver
127
+
```
128
+
111
129
### Example Realtime Workflow
112
130
113
131
1. Start the daemon in a terminal window:
@@ -146,6 +164,42 @@ The pop-up window showing when recording has begun can be disabled by setting `V
146
164
147
165
4. The transcribed text will be typed into the active application.
148
166
167
+
### Example Workflow: Transcribing an Online Meeting or Video Stream
168
+
169
+
To create a transcript of an online meeting or video stream by capturing system audio:
170
+
171
+
1. List available capture devices:
172
+
173
+
```bash
174
+
./voxinput devices
175
+
```
176
+
177
+
Identify the monitor device, e.g., "Monitor of Built-in Audio Analog Stereo".
178
+
179
+
2. Start the daemon specifying the device and output file:
180
+
181
+
```bash
182
+
VOXINPUT_CAPTURE_DEVICE="Monitor of Built-in Audio Analog Stereo" ./voxinput listen --output-file meeting_transcript.txt
183
+
```
184
+
185
+
Note: Add `--no-realtime` if you prefer the HTTP API.
186
+
187
+
3. Start recording:
188
+
189
+
```bash
190
+
./voxinput record
191
+
```
192
+
193
+
4. Play your online meeting or video stream; the system audio will be captured.
194
+
195
+
5. Stop recording:
196
+
197
+
```bash
198
+
./voxinput stop
199
+
```
200
+
201
+
6. The transcript is now in `meeting_transcript.txt`.
202
+
149
203
### Quick start with LocalAI
150
204
151
205
1. Follow https://localai.io/installation/ to install LocalAI, the simplest way is using Docker:
@@ -183,10 +237,6 @@ The realtime mode has a UI to display various actions being taken by VoxInput. H
183
237
-`SIGUSR2`: Stop recording and transcribe audio.
184
238
-`SIGTERM`: Stop the daemon.
185
239
186
-
## Limitations
187
-
188
-
- Uses the default audio input, make sure you have the device you want to use set as the default on your system.
189
-
190
240
## License
191
241
192
242
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
0 commit comments