Mac Meeting Transcriber

A local, privacy-first meeting transcription tool for macOS. Captures system audio and microphone, provides live transcription with speaker identification, and generates polished post-meeting transcripts — all running entirely on your Mac with no cloud APIs.

Features

Live transcription — real-time speech-to-text displayed in your terminal as the meeting happens
Speaker diarization — identifies up to 4 speakers with color-coded labels, stabilized across chunks
System audio + mic capture — records both sides of the conversation (what you hear and what you say) using ScreenCaptureKit and AVAudioEngine
Sentence-level output — each sentence appears on its own line with a precise timestamp
Smart transcript cleanup — automatically merges fragmented lines into readable speaker-turn paragraphs (instant, no models)
Auto-summary — generates meeting summary with action items using Claude Code
Whisper prompt conditioning — feeds previous context to Whisper for better continuity and accuracy
Hallucination filtering — detects and filters Whisper artifacts using pattern matching, repetition detection, and built-in thresholds
Audio recording — saves the full meeting audio as a WAV file for reference
Markdown transcripts — outputs clean, readable markdown files with timestamps and speaker labels
100% local — all processing happens on-device using Apple Silicon GPU acceleration. No data leaves your machine.

How It Works

┌──────────────────────────────────────────────┐
│              Swift CLI (AudioCapture)         │
│  ScreenCaptureKit (system audio)             │
│  + AVAudioEngine (mic)                       │
│  → Mixed PCM Float32 16kHz mono → stdout     │
└─────────────────────┬────────────────────────┘
                      │ raw audio pipe
                      ▼
┌──────────────────────────────────────────────┐
│            Python Pipeline (transcriber)      │
│                                              │
│  Audio Buffer → webrtcvad → mlx-whisper (ASR)│
│    (20s chunks)    → Sortformer (speakers)   │
│                    → Prompt conditioning      │
│                    → Speaker consistency      │
│                    → Hallucination filtering  │
│                    → Live terminal display    │
│                    → Markdown transcript      │
│                    → WAV file recording       │
└─────────────────────┬────────────────────────┘
                      │ on Ctrl+C
                      ▼
┌──────────────────────────────────────────────┐
│   Cleanup + Summary (text only, instant)     │
│                                              │
│  Merge speaker turns, filter hallucinations  │
│  → claude -p for meeting summary             │
└──────────────────────────────────────────────┘

Prerequisites

Hardware

Mac with Apple Silicon (M1, M2, M3, M4 — any variant)
16GB RAM recommended (8GB may work with the smaller model)

Software

Install these before running:

Xcode Command Line Tools (for Swift compilation)
```
xcode-select --install
```

Homebrew (if not already installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

uv (Python package manager)
```
brew install uv
```
ffmpeg (required by mlx-whisper)
```
brew install ffmpeg
```

macOS Permissions

On first run, macOS will prompt for these permissions. You can also grant them in advance:

Screen Recording — required for system audio capture
- System Settings → Privacy & Security → Screen Recording
- Add your terminal app (Terminal, iTerm2, Warp, etc.)
- Restart your terminal after granting
Microphone — required for mic capture
- System Settings → Privacy & Security → Microphone
- Will be prompted automatically on first run

Quick Start

# Clone the repo
git clone <repo-url>
cd meeting-assistant

# Build the Swift audio capture CLI (first time only)
cd AudioCapture && swift build -c release && cd ..

# Run a meeting transcription
./run.sh my-meeting

That's it. The first run will download the AI models (~1.5GB for Whisper, ~200MB for Sortformer) which takes a few minutes. Subsequent runs start in seconds.

Usage

Basic Usage

# Start transcribing (creates timestamped name automatically)
./run.sh

# Start with a custom meeting name
./run.sh "weekly-standup"

# Start with a custom meeting name
./run.sh "client-review-2024"

What Happens

Live phase — transcription appears in your terminal in real-time:

[00:00:05] Speaker 1: Let's start with the status update.
[00:00:12] Speaker 2: Sure, the API work is on track.
[00:00:18] Speaker 2: We should have it done by Friday.
[00:00:24] Speaker 1: Great. What about the frontend?

Ctrl+C — stops recording, saves the live transcript
Cleanup — automatically merges fragmented lines into readable speaker-turn paragraphs (instant, no models)
Summary — generates a meeting summary using claude -p
Done — all files saved to ~/Documents/Transcripts/<meeting-name>/

Output Files

After a meeting, you'll find these files:

~/Documents/Transcripts/my-meeting/
├── transcript.md         # Live transcript (raw, per-segment output)
├── transcript_clean.md   # Cleaned transcript (merged speaker turns, filtered)
├── audio.wav             # Full meeting audio (~115 MB/hour)
└── summary.md            # AI-generated meeting summary

Use transcript_clean.md as your primary reference — it has merged speaker turns and filtered hallucinations.

Advanced Usage

Run the transcriber directly (without run.sh):

# Pipe audio capture to transcriber
./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --output /path/to/transcript.md \
    --meeting-name "My Meeting"

Re-run transcript cleanup:

uv run --project transcriber python transcriber/cleanup_transcript.py \
  --input ~/Documents/Transcripts/my-meeting/transcript.md \
  --output ~/Documents/Transcripts/my-meeting/transcript_clean.md

Transcribe a pre-recorded WAV file (no live capture):

uv run --project transcriber python transcriber/transcriber.py \
  --input-file recording.wav \
  --output transcript.md

Disable speaker diarization:

./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --no-diarize --output transcript.md

Don't save audio (saves disk space):

./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --no-save-audio --output transcript.md

List available microphone devices:

./AudioCapture/.build/release/AudioCapture --list-devices

Select a specific microphone:

./AudioCapture/.build/release/AudioCapture --mic-device "MacBook Pro Microphone" --output stdout

CLI Reference

AudioCapture (Swift)

Flag	Default	Description
`--output <mode>`	`stdout`	Output mode: `stdout` or `pipe`
`--sample-rate <hz>`	`16000`	Audio sample rate
`--list-devices`	—	List available input devices
`--mic-device <name>`	system default	Select microphone by name
`--help`	—	Show help

transcriber.py (Python)

Flag	Default	Description
`--output <path>`	none	Transcript output file (.md)
`--meeting-name <name>`	"Meeting"	Meeting name for header
`--input-file <path>`	stdin	Read WAV file instead of stdin
`--model <id>`	`mlx-community/distil-whisper-large-v3`	Whisper model
`--energy-threshold <f>`	`0.01`	VAD sensitivity (lower = more sensitive)
`--no-diarize`	—	Disable speaker identification
`--no-save-audio`	—	Don't save audio WAV file

cleanup_transcript.py (Python)

Flag	Default	Description
`--input <path>`	required	Path to live transcript (.md)
`--output <path>`	required	Output cleaned transcript path (.md)
`--pause-threshold <f>`	`15.0`	Seconds of pause to start new paragraph

Models Used

Model	Purpose	Size	Source
distil-whisper-large-v3	Speech-to-text	~1.5 GB	MLX Community
Sortformer v2.1	Speaker diarization	~200 MB	MLX Community (NVIDIA)

Both models run locally on Apple Silicon GPU via MLX. Models are downloaded automatically on first run and cached in ~/.cache/huggingface/.

Project Structure

meeting-assistant/
├── AudioCapture/                    # Swift CLI for audio capture
│   ├── Package.swift                # Swift package config
│   └── Sources/AudioCapture/
│       ├── main.swift               # Entry point, signal handling
│       ├── SystemAudioCapture.swift  # ScreenCaptureKit system audio
│       ├── MicCapture.swift          # AVAudioEngine mic input
│       ├── AudioMixer.swift          # Dual-buffer mixing + output
│       └── CLIArguments.swift        # CLI argument parsing
├── transcriber/                     # Python transcription pipeline
│   ├── pyproject.toml               # Python dependencies (uv)
│   ├── transcriber.py               # Main live pipeline
│   ├── cleanup_transcript.py        # Lightweight text cleanup (no models)
│   ├── audio_reader.py              # Threaded audio buffer
│   ├── diarizer.py                  # Sortformer speaker diarization
│   ├── display.py                   # Terminal + markdown output
│   ├── vad.py                       # Voice activity detection (webrtcvad)
│   ├── config.py                    # Configuration constants
│   └── summary_prompt.md            # LLM summary prompt template
└── run.sh                           # One-command runner

Limitations

Speaker limit — Sortformer supports up to 4 simultaneous speakers
English only — configured for English transcription (change language parameter for others)
Apple Silicon required — MLX models only run on Apple Silicon Macs
macOS 13+ — ScreenCaptureKit requires macOS Ventura or later
Memory — uses ~4-6 GB RAM during live transcription (Whisper + Sortformer + audio buffers)

Troubleshooting

"Screen Recording permission denied"

Grant permission in System Settings → Privacy & Security → Screen Recording
Add your terminal app and restart it (not just the tab — quit and reopen)

"No module named 'mlx_audio.vad'"

The diarization module requires the git version of mlx-audio. Run: cd transcriber && uv sync

"[Errno 2] No such file or directory: 'ffmpeg'"

Install ffmpeg: brew install ffmpeg

No audio captured / silent transcript

Check that your terminal has both Screen Recording and Microphone permissions
Try ./AudioCapture/.build/release/AudioCapture --list-devices to verify audio devices
Test with: ./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | head -c 64000 > test.pcm

High memory usage

The transcriber monitors memory and warns above 12 GB
Use --no-diarize to skip the diarization model (~1-2 GB savings)
Use the smaller Whisper model: --model mlx-community/distil-whisper-medium.en

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AudioCapture		AudioCapture
transcriber		transcriber
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mac Meeting Transcriber

Features

How It Works

Prerequisites

Hardware

Software

macOS Permissions

Quick Start

Usage

Basic Usage

What Happens

Output Files

Advanced Usage

CLI Reference

AudioCapture (Swift)

transcriber.py (Python)

cleanup_transcript.py (Python)

Models Used

Project Structure

Limitations

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mac Meeting Transcriber

Features

How It Works

Prerequisites

Hardware

Software

macOS Permissions

Quick Start

Usage

Basic Usage

What Happens

Output Files

Advanced Usage

CLI Reference

AudioCapture (Swift)

transcriber.py (Python)

cleanup_transcript.py (Python)

Models Used

Project Structure

Limitations

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages