Skip to content

prasannasahaj/meeting-transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mac Meeting Transcriber

A local, privacy-first meeting transcription tool for macOS. Captures system audio and microphone, provides live transcription with speaker identification, and generates polished post-meeting transcripts — all running entirely on your Mac with no cloud APIs.

Features

  • Live transcription — real-time speech-to-text displayed in your terminal as the meeting happens
  • Speaker diarization — identifies up to 4 speakers with color-coded labels, stabilized across chunks
  • System audio + mic capture — records both sides of the conversation (what you hear and what you say) using ScreenCaptureKit and AVAudioEngine
  • Sentence-level output — each sentence appears on its own line with a precise timestamp
  • Smart transcript cleanup — automatically merges fragmented lines into readable speaker-turn paragraphs (instant, no models)
  • Auto-summary — generates meeting summary with action items using Claude Code
  • Whisper prompt conditioning — feeds previous context to Whisper for better continuity and accuracy
  • Hallucination filtering — detects and filters Whisper artifacts using pattern matching, repetition detection, and built-in thresholds
  • Audio recording — saves the full meeting audio as a WAV file for reference
  • Markdown transcripts — outputs clean, readable markdown files with timestamps and speaker labels
  • 100% local — all processing happens on-device using Apple Silicon GPU acceleration. No data leaves your machine.

How It Works

┌──────────────────────────────────────────────┐
│              Swift CLI (AudioCapture)         │
│  ScreenCaptureKit (system audio)             │
│  + AVAudioEngine (mic)                       │
│  → Mixed PCM Float32 16kHz mono → stdout     │
└─────────────────────┬────────────────────────┘
                      │ raw audio pipe
                      ▼
┌──────────────────────────────────────────────┐
│            Python Pipeline (transcriber)      │
│                                              │
│  Audio Buffer → webrtcvad → mlx-whisper (ASR)│
│    (20s chunks)    → Sortformer (speakers)   │
│                    → Prompt conditioning      │
│                    → Speaker consistency      │
│                    → Hallucination filtering  │
│                    → Live terminal display    │
│                    → Markdown transcript      │
│                    → WAV file recording       │
└─────────────────────┬────────────────────────┘
                      │ on Ctrl+C
                      ▼
┌──────────────────────────────────────────────┐
│   Cleanup + Summary (text only, instant)     │
│                                              │
│  Merge speaker turns, filter hallucinations  │
│  → claude -p for meeting summary             │
└──────────────────────────────────────────────┘

Prerequisites

Hardware

  • Mac with Apple Silicon (M1, M2, M3, M4 — any variant)
  • 16GB RAM recommended (8GB may work with the smaller model)

Software

Install these before running:

  1. Xcode Command Line Tools (for Swift compilation)

    xcode-select --install
  2. Homebrew (if not already installed)

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  3. uv (Python package manager)

    brew install uv
  4. ffmpeg (required by mlx-whisper)

    brew install ffmpeg

macOS Permissions

On first run, macOS will prompt for these permissions. You can also grant them in advance:

  1. Screen Recording — required for system audio capture

    • System Settings → Privacy & Security → Screen Recording
    • Add your terminal app (Terminal, iTerm2, Warp, etc.)
    • Restart your terminal after granting
  2. Microphone — required for mic capture

    • System Settings → Privacy & Security → Microphone
    • Will be prompted automatically on first run

Quick Start

# Clone the repo
git clone <repo-url>
cd meeting-assistant

# Build the Swift audio capture CLI (first time only)
cd AudioCapture && swift build -c release && cd ..

# Run a meeting transcription
./run.sh my-meeting

That's it. The first run will download the AI models (~1.5GB for Whisper, ~200MB for Sortformer) which takes a few minutes. Subsequent runs start in seconds.

Usage

Basic Usage

# Start transcribing (creates timestamped name automatically)
./run.sh

# Start with a custom meeting name
./run.sh "weekly-standup"

# Start with a custom meeting name
./run.sh "client-review-2024"

What Happens

  1. Live phase — transcription appears in your terminal in real-time:

    [00:00:05] Speaker 1: Let's start with the status update.
    [00:00:12] Speaker 2: Sure, the API work is on track.
    [00:00:18] Speaker 2: We should have it done by Friday.
    [00:00:24] Speaker 1: Great. What about the frontend?
    
  2. Ctrl+C — stops recording, saves the live transcript

  3. Cleanup — automatically merges fragmented lines into readable speaker-turn paragraphs (instant, no models)

  4. Summary — generates a meeting summary using claude -p

  5. Done — all files saved to ~/Documents/Transcripts/<meeting-name>/

Output Files

After a meeting, you'll find these files:

~/Documents/Transcripts/my-meeting/
├── transcript.md         # Live transcript (raw, per-segment output)
├── transcript_clean.md   # Cleaned transcript (merged speaker turns, filtered)
├── audio.wav             # Full meeting audio (~115 MB/hour)
└── summary.md            # AI-generated meeting summary

Use transcript_clean.md as your primary reference — it has merged speaker turns and filtered hallucinations.

Advanced Usage

Run the transcriber directly (without run.sh):

# Pipe audio capture to transcriber
./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --output /path/to/transcript.md \
    --meeting-name "My Meeting"

Re-run transcript cleanup:

uv run --project transcriber python transcriber/cleanup_transcript.py \
  --input ~/Documents/Transcripts/my-meeting/transcript.md \
  --output ~/Documents/Transcripts/my-meeting/transcript_clean.md

Transcribe a pre-recorded WAV file (no live capture):

uv run --project transcriber python transcriber/transcriber.py \
  --input-file recording.wav \
  --output transcript.md

Disable speaker diarization:

./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --no-diarize --output transcript.md

Don't save audio (saves disk space):

./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | \
  uv run --project transcriber python transcriber/transcriber.py \
    --no-save-audio --output transcript.md

List available microphone devices:

./AudioCapture/.build/release/AudioCapture --list-devices

Select a specific microphone:

./AudioCapture/.build/release/AudioCapture --mic-device "MacBook Pro Microphone" --output stdout

CLI Reference

AudioCapture (Swift)

Flag Default Description
--output <mode> stdout Output mode: stdout or pipe
--sample-rate <hz> 16000 Audio sample rate
--list-devices List available input devices
--mic-device <name> system default Select microphone by name
--help Show help

transcriber.py (Python)

Flag Default Description
--output <path> none Transcript output file (.md)
--meeting-name <name> "Meeting" Meeting name for header
--input-file <path> stdin Read WAV file instead of stdin
--model <id> mlx-community/distil-whisper-large-v3 Whisper model
--energy-threshold <f> 0.01 VAD sensitivity (lower = more sensitive)
--no-diarize Disable speaker identification
--no-save-audio Don't save audio WAV file

cleanup_transcript.py (Python)

Flag Default Description
--input <path> required Path to live transcript (.md)
--output <path> required Output cleaned transcript path (.md)
--pause-threshold <f> 15.0 Seconds of pause to start new paragraph

Models Used

Model Purpose Size Source
distil-whisper-large-v3 Speech-to-text ~1.5 GB MLX Community
Sortformer v2.1 Speaker diarization ~200 MB MLX Community (NVIDIA)

Both models run locally on Apple Silicon GPU via MLX. Models are downloaded automatically on first run and cached in ~/.cache/huggingface/.

Project Structure

meeting-assistant/
├── AudioCapture/                    # Swift CLI for audio capture
│   ├── Package.swift                # Swift package config
│   └── Sources/AudioCapture/
│       ├── main.swift               # Entry point, signal handling
│       ├── SystemAudioCapture.swift  # ScreenCaptureKit system audio
│       ├── MicCapture.swift          # AVAudioEngine mic input
│       ├── AudioMixer.swift          # Dual-buffer mixing + output
│       └── CLIArguments.swift        # CLI argument parsing
├── transcriber/                     # Python transcription pipeline
│   ├── pyproject.toml               # Python dependencies (uv)
│   ├── transcriber.py               # Main live pipeline
│   ├── cleanup_transcript.py        # Lightweight text cleanup (no models)
│   ├── audio_reader.py              # Threaded audio buffer
│   ├── diarizer.py                  # Sortformer speaker diarization
│   ├── display.py                   # Terminal + markdown output
│   ├── vad.py                       # Voice activity detection (webrtcvad)
│   ├── config.py                    # Configuration constants
│   └── summary_prompt.md            # LLM summary prompt template
└── run.sh                           # One-command runner

Limitations

  • Speaker limit — Sortformer supports up to 4 simultaneous speakers
  • English only — configured for English transcription (change language parameter for others)
  • Apple Silicon required — MLX models only run on Apple Silicon Macs
  • macOS 13+ — ScreenCaptureKit requires macOS Ventura or later
  • Memory — uses ~4-6 GB RAM during live transcription (Whisper + Sortformer + audio buffers)

Troubleshooting

"Screen Recording permission denied"

  • Grant permission in System Settings → Privacy & Security → Screen Recording
  • Add your terminal app and restart it (not just the tab — quit and reopen)

"No module named 'mlx_audio.vad'"

  • The diarization module requires the git version of mlx-audio. Run: cd transcriber && uv sync

"[Errno 2] No such file or directory: 'ffmpeg'"

  • Install ffmpeg: brew install ffmpeg

No audio captured / silent transcript

  • Check that your terminal has both Screen Recording and Microphone permissions
  • Try ./AudioCapture/.build/release/AudioCapture --list-devices to verify audio devices
  • Test with: ./AudioCapture/.build/release/AudioCapture --output stdout 2>/dev/null | head -c 64000 > test.pcm

High memory usage

  • The transcriber monitors memory and warns above 12 GB
  • Use --no-diarize to skip the diarization model (~1-2 GB savings)
  • Use the smaller Whisper model: --model mlx-community/distil-whisper-medium.en

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors