No speedup. MacBook Pro 13, M2, 16 GB.
WhisperServer is a lightweight macOS menu bar app that runs in the background.
It exposes a local HTTP server compatible with the OpenAI Whisper API for audio transcription.
- Local HTTP server compatible with the OpenAI Whisper API
- Menu bar application (no Dock icon)
- Streaming via Server‑Sent Events (SSE) with automatic chunked fallback
- Automatic VAD-based chunking for Whisper models to prevent repeated text in long audio files — a common issue with standard whisper.cpp
- Automatically downloads models on first use
- Fast, high‑quality quantized models
- Parakeet model can transcribe ~1 hour of audio in about 1 minute
- macOS 14.6 or newer
- Apple Silicon (ARM64) only
| Project | Platform | Key features |
|---|---|---|
| VibeScribe | macOS | Automatic call summarization and transcription for meetings, interviews, and brainstorming. Key features: AI-powered summaries, easy export of notes, transcription. |
- Go to the Releases page.
- Download the latest
.dmgfile. - Open the
.dmgfile. - Drag WhisperServer to your Applications folder.
This app is not signed by Apple. To open it the first time:
- Control‑click (or right‑click) WhisperServer in Applications.
- Choose Open.
- In the warning dialog, click Open.
- Or go to System Settings → Privacy & Security and allow the app.
Example
1017.3.mp4
curl -X POST http://localhost:12017/v1/audio/transcriptions \
-F file=@/path/to/audio.mp3| Parameter | Description | Values | Required |
|---|---|---|---|
| file | Audio file | wav, mp3, m4a | yes |
| model | Model to use | model ID | no |
| prompt | Guide style/tone (Whisper) | string | no |
| response_format | Output format | json, text, srt, vtt, verbose_json | no |
| language | Input language (ISO 639‑1) | 2‑letter code | no |
| diarize | Enable Fluid speaker diarization | true, false (default false) | no |
| stream | Enable streaming (SSE or chunked) | true, false | no |
| Model | Relative speed | Quality |
|---|---|---|
parakeet-tdt-0.6b-v3 |
Fastest | Medium |
tiny-q5_1 |
Fast | Good (English), Low (other languages) |
large-v3-turbo-q5_0 |
Slow | Medium–Good |
medium-q5_0 |
Slowest | Good |
The server supports multiple response formats:
curl -X POST http://localhost:12017/v1/audio/transcriptions \
-F file=@/path/to/audio.mp3 \
-F response_format=json- json (default)
{
"text": "Transcription text."
}- verbose_json
{
"task": "transcribe",
"language": "en",
"duration": 10.5,
"text": "Full transcription text.",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 5.0,
"text": "First segment.",
"tokens": [50364, 13, 11, 263, 6116],
"temperature": 0.0,
"avg_logprob": -0.45,
"compression_ratio": 1.275,
"no_speech_prob": 0.1
}
]
}- text
And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
- srt
1
00:00:00,240 --> 00:00:07,839
And so, my fellow Americans, ask not what your country can do for you
2
00:00:07,839 --> 00:00:10,640
ask what you can do for your country.
- vtt
WEBVTT
00:00:00.240 --> 00:00:07.839
And so, my fellow Americans, ask not what your country can do for you
00:00:07.839 --> 00:00:10.640
ask what you can do for your country.
WhisperServer supports real‑time streaming with automatic protocol detection. Note: timestamped streaming (srt, vtt, verbose_json) requires the Whisper provider; the Fluid provider streams text/JSON only.
If the client sends the header Accept: text/event-stream, the server uses SSE:
curl -X POST http://localhost:12017/v1/audio/transcriptions \
-H "Accept: text/event-stream" \
-F file=@audio.wav \
-F stream=true \
--no-bufferResponse format:
data: First transcribed segment
data:
data: Second transcribed segment
data:
event: end
data:
If SSE isn’t supported, the server falls back to HTTP chunked transfer encoding:
curl -X POST http://localhost:12017/v1/audio/transcriptions \
-F file=@audio.wav \
-F stream=true \
--no-bufferAdd speaker labels (who is talking) when you use the FluidAudio provider. Diarization is off by default to stay compatible with the OpenAI Whisper API.
How to enable:
- Select the Fluid provider in the menu bar (or pass the Fluid model ID), and
- Add
diarize=trueto your request.
Example:
curl -X POST http://localhost:12017/v1/audio/transcriptions \
-F file=@meeting.wav \
-F model=parakeet-tdt-0.6b-v3 \
-F response_format=json \
-F diarize=trueWhat you get:
- For
response_format=json, the server adds aspeaker_segmentsarray:{ "text": "Good morning everyone...", "speaker_segments": [ { "speaker": "Speaker_1", "start": 0.0, "end": 4.2, "text": "Good morning everyone" }, { "speaker": "Speaker_2", "start": 4.2, "end": 7.8, "text": "Morning! Shall we begin?" } ] } - For
response_format=verbose_json,speaker_segmentsis added as well. The existingsegmentsfield stays unchanged.
Streaming:
- Streaming sends one JSON chunk with
speaker_segmentswhen diarization completes. - Then the standard
endevent is sent.
Toggles available from the menu bar:
- Launch at Login — registers WhisperServer as a login item via
SMAppService, so the server starts automatically when you sign in. You can also revoke this from System Settings → General → Login Items. - Expose on Local Network — binds the HTTP server to
0.0.0.0:12017instead oflocalhost, so other devices on your LAN (phone, another Mac) can reach the API. When enabled, the menu shows the full URL (e.g.http://192.168.1.42:12017) and offers a Copy Server URL action. - Require API Key — appears under the LAN URL when LAN exposure is on. When enabled, LAN clients must present a bearer token; requests from this Mac (
127.0.0.1/::1) always bypass the check so local tooling keeps working.
When Require API Key is on for the first time, WhisperServer generates a random key (ws- prefix + 64 hex chars) and stores it in the macOS Keychain. Clients must send it in the Authorization header:
curl -X POST http://192.168.1.42:12017/v1/audio/transcriptions \
-H "Authorization: Bearer ws-<your-key>" \
-F file=@audio.wavThe menu exposes two actions while the toggle is on:
- Copy API Key — copies the current key to the clipboard.
- Regenerate API Key… — replaces the key; any existing clients stop working until they receive the new value. The new key is copied to your clipboard automatically.
Turning Require API Key off (or turning LAN exposure off entirely) does not delete the key — it stays in the Keychain so the same key works when you re-enable the toggle.
On first connection, macOS may show a prompt asking you to allow incoming connections for WhisperServer — that's the macOS Application Firewall, unrelated to the app itself.
If you want to build WhisperServer yourself:
- Clone the repository:
git clone https://github.com/pfrankov/whisper-server.git
cd whisper-server-
Open the project in Xcode.
-
Select your development team:
- Click the project in Xcode
- Select the WhisperServer target
- Go to "Signing & Capabilities"
- Choose your team
- Build and run:
- Press
Cmd + Rto build and run - Or use the menu: Product → Run
- Press
- Run the app, then run the script:
test_api.sh(complete API test suite)
- In the menu bar, open
Select Model→Import Whisper Model… - Choose a
.binmodel file (optionally add its.mlmodelcbundle in the same dialog) - The model becomes selectable in the menu and is listed in
GET /v1/models
MIT