WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words into written text, ready to be pasted wherever you need it. Powered by OpenAI's Whisper model via faster-whisper, it provides fast, free, and fully local transcription — your audio never leaves your machine.
- Record audio with a simple click or global hotkey (
Alt+Shift+R). - Fast, local transcription using OpenAI's Whisper model with GPU acceleration (CUDA).
- Option to save transcriptions directly to the clipboard.
- Transcribe existing audio files via the file picker.
- Optional LLM context prefix — prepends a note explaining the text was generated via speech-to-text.
- Real-time audio visualizer showing recording and transcription states.
- Remote transcription API — transcribe audio from your phone using your PC's GPU via an iOS Shortcut or Android app.
- Python 3.10 or higher
- CUDA is highly recommended for better performance but not necessary. WhisperClip can also run on a CPU.
-
Clone the repository:
git clone https://github.com/gustavostz/whisper-clip.git cd whisper-clip -
Create and activate a virtual environment:
python -m venv .venv .venv\Scripts\activate # Windows source .venv/bin/activate # Linux/macOS -
Install the required dependencies:
pip install -r requirements.txt
The default model is turbo (large-v3-turbo), which offers the best balance of speed and accuracy at ~1.5 GB VRAM with int8 quantization. Available models:
| Size | Required VRAM (int8) | Relative speed |
|---|---|---|
| tiny | ~0.5 GB | fastest |
| base | ~0.5 GB | fast |
| small | ~1 GB | moderate |
| medium | ~2.5 GB | slower |
| large-v3 | ~3 GB | slowest |
| turbo | ~1.5 GB | fast + accurate (recommended) |
To change the model, modify model_name in config.json. You can also change compute_type (default: int8) — options include float16, int8_float16, int8.
Run the application:
python main.py
- Click the microphone button to start and stop recording.
- If "Save to Clipboard" is checked, the transcription will be copied to your clipboard automatically.
Copy config.example.json to config.json and edit as needed:
cp config.example.json config.json
| Setting | Default | Description |
|---|---|---|
model_name |
"turbo" |
Whisper model to use (see table above) |
compute_type |
"int8" |
Quantization type (int8, float16, int8_float16) |
hotwords |
"" |
Space-separated words to bias transcription toward (e.g. "Claude" so "cloud code" becomes "Claude Code") |
shortcut |
"alt+shift+r" |
Global hotkey for toggling recording |
notify_clipboard_saving |
true |
Play a sound when transcription is copied to clipboard |
llm_context_prefix |
true |
Prepend a note to transcriptions explaining they were generated via speech-to-text |
server_enabled |
false |
Enable the remote transcription API server |
server_port |
8787 |
Port for the API server |
server_api_key |
"" |
API key for authenticating remote requests (required if server is enabled) |
WhisperClip includes a built-in API server that lets you transcribe audio from your phone using your PC's GPU. Record on your iPhone or Android device, send the audio to your PC over a VPN, and get the transcription back in seconds — copied straight to your phone's clipboard.
-
Generate an API key — this can be any password or string you choose. For a random secure key, you can run:
python -c "import secrets; print(secrets.token_urlsafe(32))" -
Install the server dependencies (if not already installed):
pip install fastapi uvicorn[standard] python-multipart -
Edit your
config.json:{ "server_enabled": true, "server_port": 8787, "server_api_key": "YOUR_GENERATED_KEY_HERE" } -
Start WhisperClip normally — the API server starts automatically alongside the desktop app:
python main.py -
Verify the server is running by opening this URL in your PC's browser:
http://localhost:8787/api/v1/healthYou should see:
{"status":"ok","model":"turbo","compute_type":"int8"}
Your phone needs to reach your PC over the network. Since they won't always be on the same Wi-Fi, a mesh VPN is the easiest solution. Below are two popular options, but any VPN or tunneling tool that gives your devices a stable IP will work.
Tailscale creates a peer-to-peer mesh network between your devices. No port forwarding needed.
- Install Tailscale on your PC and phone (available on Windows, iOS, Android).
- Sign in with the same account on both devices.
- Note your PC's Tailscale IP (shown in the Tailscale app, e.g.
100.x.y.z). - Test from your phone's browser:
http://<TAILSCALE_IP>:8787/api/v1/health
Note: If you use another VPN (e.g. NordVPN, ExpressVPN) alongside Tailscale, the two may conflict since both manage routing. Consider using only Tailscale, or use a mesh VPN solution from your existing VPN provider (see Option B).
If you already use NordVPN, you can use its Meshnet feature instead of Tailscale. Many VPN providers offer similar mesh/LAN features.
- Enable Meshnet in NordVPN on both your PC and phone.
- Link your devices in the Meshnet settings.
- Note your PC's Meshnet IP (e.g.
100.x.y.z) — shown in the NordVPN Meshnet panel. - Test from your phone's browser:
http://<MESHNET_IP>:8787/api/v1/health
This avoids conflicts with your existing VPN connection since Meshnet runs alongside NordVPN.
Create an iOS Shortcut that records audio, sends it to your PC for transcription, and copies the result to your clipboard.
- Open the Shortcuts app on your iPhone.
- Tap + to create a new shortcut.
- Add the following actions in order:
Action 1 — Record Audio:
- Search for "Record Audio" and add it.
- Set audio quality to Normal (or Very High if you prefer).
- Disable "Begin recording immediately" if you want a countdown, or enable it for instant recording.
Action 2 — Get Contents of URL:
- Search for "Get Contents of URL" and add it.
- Set the URL to:
Replace
http://<YOUR_PC_IP>:8787/api/v1/transcribe?llm_context_prefix=false<YOUR_PC_IP>with your Tailscale/Meshnet IP. - Set Method to POST.
- Under Headers, add:
- Key:
X-API-Key - Value: your API key from
config.json
- Key:
- Under Request Body, select Form.
- Add a field:
- Key:
file - Type: File
- Value: select "Recorded Audio" (the output from Action 1)
- Key:
Action 3 — Get Dictionary Value:
- Search for "Get Dictionary Value" and add it.
- Set it to get the value for key
textfrom "Contents of URL".
Action 4 — Copy to Clipboard:
- Search for "Copy to Clipboard" and add it.
- It will automatically use the dictionary value from Action 3.
Action 5 (Optional) — Show Notification:
- Search for "Show Notification" and add it.
- Set the body to "Dictionary Value" to see the transcription result.
- Tap the shortcut name at the top to rename it (e.g. "Transcribe").
- Tap Done.
To test: run the shortcut, speak a few words, tap stop, and wait for the transcription. It should appear in your clipboard (and in the notification if you added Action 5).
Tip: Set
llm_context_prefix=truein the URL if you want the transcription prefixed with a note that it was generated via speech-to-text — useful when pasting into LLM chats.
Android doesn't have a built-in equivalent to iOS Shortcuts, but you can achieve the same result with automation apps:
- Tasker — Create a task that records audio, sends an HTTP POST to the transcription endpoint, parses the JSON response, and copies the text to the clipboard. Tasker supports all the required actions (audio recording, HTTP requests, JSON parsing, clipboard).
- HTTP Shortcuts — A simpler app focused on making HTTP requests. You can set up a shortcut that sends an audio file to the API and displays the result.
- Automate — Flow-based automation similar to iOS Shortcuts. Build a flow: Record Audio → HTTP Request → Parse JSON → Copy to Clipboard.
The API endpoint and parameters are the same as the iOS setup:
- URL:
http://<YOUR_PC_IP>:8787/api/v1/transcribe - Method: POST
- Header:
X-API-Key: <your_key> - Body: multipart form with
filefield containing the audio recording
To make transcription faster to trigger from your phone:
- iPhone Action Button (iPhone 15 Pro+): Settings → Action Button → set to Shortcut → select your transcription shortcut. Single press to start transcribing.
- iPhone Back Tap: Settings → Accessibility → Touch → Back Tap → set Double Tap or Triple Tap to your shortcut.
- iPhone Lock Screen: Add the Shortcuts widget to your lock screen for one-tap access.
- iPhone Home Screen: In the Shortcuts app, long-press your shortcut → Add to Home Screen.
- Android: Most automation apps support home screen widgets and quick settings tiles.
If there's interest in a more user-friendly, executable version of WhisperClip, I'd be happy to consider creating one. Your feedback and suggestions are welcome! Just let me know through the GitHub issues.
This project uses faster-whisper (a CTranslate2-based reimplementation of OpenAI's Whisper) for audio transcription.
