Native GPU Whisper

LIMA includes optional native GPU-accelerated whisper servers for faster transcription during development.

Terminology: "Docker whisper" refers to Speaches (faster-whisper in a container) - this is what runs when you make up. "Native whisper" runs directly on your machine using GPU acceleration.

When to Use Native vs Docker (Speaches)

Scenario	Recommendation
First-time setup	Docker (Speaches) - just works, no extra config
Production/consistency	Docker - predictable cold starts, no warmup needed
Development iteration	Native GPU - 3-5x faster after warmup
macOS Apple Silicon	Native MLX - fastest option, but needs warmup handling
Linux with NVIDIA GPU	Native CUDA - significant speedup over Docker

Bottom line: Start with Docker. Switch to native if you're processing many recordings during development and want faster iteration.

Quick Start

# Start native whisper in background (auto-detects platform)
make whisper-native

# Check if running
make whisper-native-status

# View logs
make whisper-native-logs

# Stop
make whisper-native-stop

The native server runs on port 9001 by default (configurable via NATIVE_WHISPER_PORT in .env).

Platform Support

Platform	Technology	Requirements
macOS (Apple Silicon)	Lightning Whisper MLX	M1/M2/M3/M4 chip
Linux	faster-whisper CUDA	NVIDIA GPU + drivers (`nvidia-smi` to verify)
Windows	faster-whisper CUDA	NVIDIA GPU + drivers

Linux/Windows: Verify NVIDIA drivers

Run nvidia-smi in a terminal (Linux) or PowerShell/cmd (Windows):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:09:00.0  On |                  Off |
|  0%   48C    P8             18W /  450W |    1775MiB /  24564MiB |     19%      Default |
+-----------------------------------------+------------------------+----------------------+

If this fails, install or update your NVIDIA drivers.

Windows note: Run the native whisper server in PowerShell or cmd, not WSL. CUDA requires direct access to Windows GPU drivers. WSL2 can access GPUs but requires additional configuration that's beyond LIMA's default setup.

Performance Comparison

Benchmark: 42-minute audio file

Platform	Speed	Notes
macOS M4 Pro (MLX)	166x realtime (~15s)	5.3x faster than Docker, slow cold start
Linux RTX 4090 (CUDA)	71x realtime (~36s)	4.3x faster than Docker
Windows RTX 4090 (CUDA)	39x realtime (~66s)	2.8x faster than Docker
Docker Speaches	14-33x realtime	Consistent, no warmup needed

Cold Start Considerations

Docker Speaches: Fast cold start, consistent performance
Native MLX (macOS): First request is slow (model loading), subsequent requests are very fast
Native CUDA: Moderate cold start, faster than Docker after warmup

Configuration

Set these in .env before starting:

Variable	Default	Purpose
`NATIVE_WHISPER_HOST`	`0.0.0.0`	Bind address
`NATIVE_WHISPER_PORT`	`9001`	Server port
`WHISPER_MODEL`	`Systran/faster-whisper-base`	Model to use

Model Options

Model	Size	Speed	Accuracy
`tiny`	~75MB	Fastest	Lower
`base`	~150MB	Fast	Good
`small`	~500MB	Medium	Better
`medium`	~1.5GB	Slower	High
`large-v3`	~3GB	Slowest	Highest

Using with LIMA Workflows

The Voice Memo workflow can be configured to use native whisper instead of Docker Speaches. Set NATIVE_WHISPER_PORT in .env before running make seed to configure the workflow automatically.

Detailed Setup

For platform-specific installation instructions, troubleshooting, and advanced configuration, see:

services/whisper-server/README.md

Next Steps

Audio Processing Guide - Chunking strategies for long recordings
Troubleshooting - Whisper-specific issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native GPU Whisper

When to Use Native vs Docker (Speaches)

Quick Start

Platform Support

Performance Comparison

Cold Start Considerations

Configuration

Model Options

Using with LIMA Workflows

Detailed Setup

Next Steps

FilesExpand file tree

native-whisper.md

Latest commit

History

native-whisper.md

File metadata and controls

Native GPU Whisper

When to Use Native vs Docker (Speaches)

Quick Start

Platform Support

Performance Comparison

Cold Start Considerations

Configuration

Model Options

Using with LIMA Workflows

Detailed Setup

Next Steps