CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Meet-scribe is a bot-free desktop meeting assistant built with Rust + Tauri (React + Vite frontend) and SQLite. It captures system-level audio/video streams from Teams, Zoom, and Google Meet without joining as a bot or requiring consent from other participants (since recording happens locally on the user's machine).

Key principle: No bots, no meeting disruption - use existing A/V streams from the desktop.

Core Architecture

Technology Stack

Backend: Rust + Tauri v2 (async with tokio)
Frontend: React + Vite + TypeScript
Database: SQLite via rusqlite with migrations
Audio Capture: Platform-specific (WASAPI for Windows, PulseAudio/PipeWire for Linux)
Transcription: AssemblyAI and Deepgram (both support diarization)
AI Processing: Pluggable LLM services (user-configurable in UI)
Secret Storage: OS keychain via keyring crate

Key Rust Crates

tauri v2.x - cross-platform desktop framework
tokio - async runtime
rusqlite - SQLite driver
serde + serde_json - serialization
thiserror / anyhow - error handling
keyring - OS keychain access (macOS Keychain, Windows Credential Manager, Linux Secret Service)
reqwest - HTTP client for external APIs
Platform-specific (Windows/Linux priority):
- windows - WASAPI loopback audio capture (Windows)
- libpulse-binding - PulseAudio monitor sources (Linux)
- screenshots or scrap - screen capture for participant detection
- hound - WAV file encoding for audio

Architectural Pattern: Ports & Adapters

Domain Layer (core business logic):

Models: Meeting, Transcript, Participant, Insight, ServiceConfig
Use cases: CaptureMeeting, TranscribeAudio, GenerateInsights, LinkParticipants

Ports (trait interfaces):

AudioCapturePort - capture system audio streams
ParticipantDetectorPort - extract participant information
TranscriptionServicePort - ASR interface
LLMServicePort - AI analysis interface
StoragePort - database operations

Adapters (implementations):

Audio: Windows (WASAPI) and Linux (PulseAudio) implementations
ASR Services: AssemblyAI and Deepgram adapters (both support diarization)
LLM Services: OpenAI, Anthropic, and other provider adapters
Storage: SQLite adapter with rusqlite

High-Level Flow

Stream Capture: Desktop app monitors and captures system audio/video streams when Teams/Zoom/Google Meet is active
Participant Detection: Extract meeting participant information via accessibility APIs or OCR
Audio Processing: Send audio to user-selected ASR service (AssemblyAI or Deepgram) for transcription
Diarization: Speaker diarization with participant linking (map "Speaker 1" to actual participant names)
- Both AssemblyAI and Deepgram provide built-in speaker diarization
- User configures API keys via Settings UI
AI Analysis: Send transcript to user-configured LLM for:
- Meeting summary
- Action items extraction
- Key insights generation
Storage: Save all data (transcript, participants, insights) to local SQLite database

Key Design Decisions

No Bot/Sidecar Approach:

Capture A/V streams at OS level, not through video platform APIs
Use loopback/monitor audio devices where possible to minimize recording indicators
No bot joining the meeting
⚠️ Note: Some OS permission indicators may still appear (screen recording on macOS, mic access on Windows)

User-Configured Services (Bring Your Own API Keys):

ASR Services: AssemblyAI and Deepgram (both with diarization support)
LLM Services: User selects provider (OpenAI, Anthropic, etc.) and enters API key in Settings UI
Users configure API keys via Settings page in the application
API keys stored securely in OS keychain (never in SQLite or config files)
Service selection and settings (model, language, etc.) stored in SQLite
UI allows switching between providers without losing previous configurations

Participant Linking:

Extract participant roster from meeting app UI via:
- Accessibility APIs (preferred on macOS/Windows)
- OCR on participant panel (fallback, fragile to UI changes)
Map diarized speakers to actual participant names
Allow manual correction in UI if auto-mapping fails
TODO: Voice fingerprinting for auto-matching across meetings

Function Design:

Keep functions small and focused
Add Rust doc comments (///) for all public functions
Write minimal happy-path tests where feasible
Add // TODO: comments for deferred work

Platform-Specific Integration

Platform Priority: Windows and Linux are prioritized. macOS support is deferred to later phases.

Windows (Priority 1)

Audio: WASAPI loopback (via windows crate) - captures desktop audio without mic indicator
Permissions: May still require microphone permission in some cases
Participant Detection: UI Automation API or screen capture + OCR
Build: MSVC toolchain required
Status: Primary development platform

Linux (Priority 1)

Audio: PulseAudio monitor sources (via libpulse-binding) or PipeWire
Permissions: User must grant access to audio devices
Participant Detection: X11/Wayland screen capture + OCR or AT-SPI
Build: AppImage or .deb package
Status: Primary development platform

macOS (Deferred)

Audio: Core Audio TAP (via coreaudio-rs) - captures application audio
Permissions: Requires screen recording permission (shows indicator in menu bar)
Participant Detection: Accessibility API (preferred) or Screen Capture Kit + OCR
Build: Xcode command-line tools, universal binary (x64 + ARM64)
Status: Deferred to Phase 6 (implementation complexity and crate maintenance issues)

Project Structure

meet-scribe/
├── apps/
│   └── desktop/
│       ├── src/                          # React + Vite frontend
│       │   ├── components/               # UI components
│       │   ├── pages/                    # Page-level components
│       │   │   ├── Dashboard.tsx
│       │   │   ├── ActiveMeeting.tsx
│       │   │   ├── MeetingHistory.tsx
│       │   │   └── Settings.tsx          # ASR/LLM API key configuration UI
│       │   ├── hooks/                    # Custom React hooks
│       │   ├── store/                    # State management (Zustand/Context)
│       │   ├── types/                    # TypeScript types
│       │   └── main.tsx
│       ├── src-tauri/                    # Rust backend
│       │   ├── src/
│       │   │   ├── domain/               # Core domain models & use cases
│       │   │   │   ├── models.rs
│       │   │   │   └── use_cases.rs
│       │   │   ├── ports/                # Trait definitions (interfaces)
│       │   │   │   ├── audio.rs
│       │   │   │   ├── detection.rs
│       │   │   │   ├── transcription.rs
│       │   │   │   ├── llm.rs
│       │   │   │   └── storage.rs
│       │   │   ├── adapters/             # Platform-specific implementations
│       │   │   │   ├── audio/
│       │   │   │   │   ├── mod.rs
│       │   │   │   │   ├── windows.rs    # WASAPI implementation (Priority 1)
│       │   │   │   │   └── linux.rs      # PulseAudio implementation (Priority 1)
│       │   │   │   ├── services/         # External API adapters
│       │   │   │   │   ├── asr/
│       │   │   │   │   │   ├── mod.rs
│       │   │   │   │   │   ├── assemblyai.rs  # Priority ASR service
│       │   │   │   │   │   └── deepgram.rs    # Priority ASR service
│       │   │   │   │   └── llm/
│       │   │   │   │       ├── mod.rs
│       │   │   │   │       ├── openai.rs
│       │   │   │   │       └── anthropic.rs
│       │   │   │   └── storage/
│       │   │   │       └── sqlite.rs
│       │   │   ├── commands/             # Tauri IPC commands
│       │   │   │   ├── meeting.rs
│       │   │   │   ├── config.rs
│       │   │   │   └── transcript.rs
│       │   │   ├── error.rs              # Error types (thiserror)
│       │   │   └── main.rs
│       │   ├── migrations/               # SQL migration files
│       │   │   ├── 001_initial.sql
│       │   │   └── ...
│       │   ├── Cargo.toml
│       │   └── tauri.conf.json
│       ├── package.json
│       ├── vite.config.ts
│       └── tsconfig.json
├── docs/                                 # Module documentation
│   ├── audio-capture.md
│   ├── participant-detection.md
│   ├── transcription-services.md
│   └── database-schema.md
├── CLAUDE.md
└── README.md

Development Commands

Initial Setup

# Install Node.js dependencies
npm install

# Install Tauri CLI
cargo install tauri-cli --version "^2.0"

# Run database migrations (after first build)
# Migrations run automatically on app startup

Development

# Run in development mode (hot-reload enabled)
npm run tauri dev

# Run frontend only (for UI development)
npm run dev

# Run Rust tests
cargo test

# Run frontend tests
npm test

# Format Rust code
cargo fmt

# Lint Rust code
cargo clippy -- -D warnings

# Format TypeScript/React code
npm run lint

Building

# Build for current platform
npm run tauri build

# Build with debug symbols
npm run tauri build -- --debug

# Create installer/package
npm run tauri build
# Windows: Creates .msi installer
# macOS: Creates .dmg and .app bundle
# Linux: Creates .deb and AppImage

Database

# Database file location (after first run):
# Windows: %APPDATA%/com.srprasanna.meet-scribe/meet-scribe.db
# macOS: ~/Library/Application Support/com.srprasanna.meet-scribe/meet-scribe.db
# Linux: ~/.local/share/com.srprasanna.meet-scribe/meet-scribe.db

# Migrations are in: apps/desktop/src-tauri/migrations/
# They run automatically on app startup via rusqlite_migration

Platform-Specific Requirements

Windows:

Install MSVC Build Tools (Visual Studio 2019+)
WebView2 runtime (usually pre-installed on Windows 10+)

macOS:

Install Xcode Command Line Tools: xcode-select --install
For universal binary: Set MACOSX_DEPLOYMENT_TARGET=10.13

Linux:

Install dependencies:

# Ubuntu/Debian
sudo apt install libwebkit2gtk-4.1-dev \
  build-essential curl wget libssl-dev \
  libgtk-3-dev libayatana-appindicator3-dev \
  librsvg2-dev libpulse-dev

Database Schema

See docs/database-schema.md for detailed schema documentation.

Core Tables:

meetings

CREATE TABLE meetings (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    platform TEXT NOT NULL,  -- 'teams', 'zoom', 'meet'
    title TEXT,
    start_time INTEGER NOT NULL,  -- Unix timestamp
    end_time INTEGER,
    participant_count INTEGER,
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);

participants

CREATE TABLE participants (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    meeting_id INTEGER NOT NULL,
    name TEXT NOT NULL,
    email TEXT,
    speaker_label TEXT,  -- 'Speaker 1', 'Speaker 2', etc. from diarization
    FOREIGN KEY (meeting_id) REFERENCES meetings(id) ON DELETE CASCADE
);

transcripts

CREATE TABLE transcripts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    meeting_id INTEGER NOT NULL,
    participant_id INTEGER,  -- NULL if speaker not yet identified
    timestamp_ms INTEGER NOT NULL,  -- Milliseconds into meeting
    text TEXT NOT NULL,
    confidence REAL,  -- 0.0 to 1.0 from ASR
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (meeting_id) REFERENCES meetings(id) ON DELETE CASCADE,
    FOREIGN KEY (participant_id) REFERENCES participants(id) ON DELETE SET NULL
);

insights

CREATE TABLE insights (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    meeting_id INTEGER NOT NULL,
    type TEXT NOT NULL,  -- 'summary', 'action_item', 'key_point', 'decision'
    content TEXT NOT NULL,
    metadata TEXT,  -- JSON for additional data (assignee, due_date, etc.)
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    FOREIGN KEY (meeting_id) REFERENCES meetings(id) ON DELETE CASCADE
);

service_configs

CREATE TABLE service_configs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    service_type TEXT NOT NULL,  -- 'asr' or 'llm'
    provider TEXT NOT NULL,  -- 'openai', 'anthropic', 'deepgram', etc.
    is_active BOOLEAN NOT NULL DEFAULT 0,
    settings TEXT,  -- JSON for provider-specific settings (model, language, etc.)
    created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    UNIQUE(service_type, provider)
);

Note: API keys are stored in OS keychain, not in the database. The keychain entry format is:

Service: com.srprasanna.meet-scribe
Account: {service_type}_{provider} (e.g., asr_deepgram, llm_anthropic)

Security & Privacy

API Key Storage:

API keys MUST be stored in OS keychain using the keyring crate
Never store API keys in SQLite, config files, or environment variables
Keychain service name: com.srprasanna.meet-scribe
Keychain account format: {service_type}_{provider} (e.g., asr_deepgram)

Data Storage:

All meeting data stays local in SQLite (no cloud sync by default)
Database location is in OS-appropriate app data directory
Audio buffers are temporary and discarded after transcription

Permissions:

Request microphone/screen recording permissions at appropriate times
Handle permission denials gracefully with user-friendly error messages
Document permission requirements clearly in UI

Audio Capture:

Audio stream capture is always user-initiated (manual start/stop)
Use loopback/monitor devices where possible to minimize OS recording indicators
Clear visual indicator in app when capture is active

Implementation Phases

Phase 1: Foundation ✅ (Start Here)

Initialize Tauri v2 project with React + Vite
Set up database schema and migrations (rusqlite + rusqlite_migration)
Implement keyring-based secret storage
Create domain models and port traits
Build basic React UI shell (pages, routing)

Phase 2: Audio Capture (Windows & Linux)

Implement AudioCapturePort trait
Windows: WASAPI loopback adapter (Priority 1)
Linux: PulseAudio monitor adapter (Priority 1)
Audio buffer management and WAV encoding (using hound crate)
Settings UI for selecting audio input device

Phase 3: ASR Integration (AssemblyAI & Deepgram)

Implement TranscriptionServicePort trait
AssemblyAI adapter with diarization support (Priority 1)
Deepgram adapter with diarization support (Priority 1)
Settings UI for ASR service selection and API key entry
Handle streaming transcription responses
Parse diarization output and store transcripts with speaker labels

Phase 4: LLM Integration for Insights

Implement LLMServicePort trait
OpenAI adapter (GPT-4, GPT-3.5-turbo)
Anthropic adapter (Claude Sonnet/Opus)
Settings UI for LLM provider selection and API key entry
Prompt engineering for summaries, action items, key points, decisions
Store insights in database with proper categorization

Phase 5: Participant Detection (Experimental)

Implement ParticipantDetectorPort trait
Screen capture utilities (Windows/Linux)
OCR integration (tesseract or cloud OCR)
Accessibility API integration (Windows UI Automation, Linux AT-SPI)
Manual participant mapping UI (allow users to correct speaker-to-participant assignments)

Phase 6: Polish & Enhancement

Real-time transcription during meetings
Export functionality (PDF, markdown, JSON)
Search across historical meetings
Multi-language support (configure in ASR settings)
Custom prompt templates for LLM analysis
macOS support (deferred from Phase 2)

Known Limitations & TODOs

ASR Services:

AssemblyAI and Deepgram both provide excellent diarization support
Users must provide their own API keys (configured in Settings UI)
Diarization accuracy depends on audio quality and speaker distinctness
TODO: Add support for adjusting diarization sensitivity in settings

Participant Detection:

No public APIs available from Teams/Zoom/Google Meet
OCR-based detection is fragile to UI changes (meeting app updates can break it)
Accessibility APIs are more robust but may have platform-specific limitations
Manual mapping UI allows users to correct auto-detected mappings
TODO: Research voice fingerprinting for speaker identification across meetings

Platform Support:

Windows & Linux: Full support (Priority 1)
macOS: Deferred to Phase 6 due to Core Audio complexity and crate maintenance concerns
TODO: Consider using screen recording + system audio as fallback on macOS when implementing

Performance:

Large meeting audio files may consume significant memory
TODO: Implement streaming/chunked upload to ASR services for long meetings
TODO: Add progress indicators for transcription and LLM analysis

UI/UX:

Settings page must clearly explain API key requirements
Provide links to AssemblyAI and Deepgram signup pages
Show estimated costs based on meeting duration (both services charge by usage)

Troubleshooting

Build Failures:

Ensure platform-specific dependencies are installed (see Development Commands)
Check Rust version: rustc --version (recommend 1.70+)
Check Node version: node --version (recommend 18+)

Audio Capture Issues:

Verify OS permissions are granted
Check that target application (Teams/Zoom/Meet) is running
Test with simple audio playback first

Database Migration Errors:

Delete database file and restart (dev only)
Check migration SQL syntax in src-tauri/migrations/

API Integration Errors:

Verify API keys are correctly stored in keychain
Check API rate limits and quota
Review service-specific documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Core Architecture

Technology Stack

Key Rust Crates

Architectural Pattern: Ports & Adapters

High-Level Flow

Key Design Decisions

Platform-Specific Integration

Windows (Priority 1)

Linux (Priority 1)

macOS (Deferred)

Project Structure

Development Commands

Initial Setup

Development

Building

Database

Platform-Specific Requirements

Database Schema

Core Tables:

Security & Privacy

Implementation Phases

Phase 1: Foundation ✅ (Start Here)

Phase 2: Audio Capture (Windows & Linux)

Phase 3: ASR Integration (AssemblyAI & Deepgram)

Phase 4: LLM Integration for Insights

Phase 5: Participant Detection (Experimental)

Phase 6: Polish & Enhancement

Known Limitations & TODOs

Troubleshooting

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Core Architecture

Technology Stack

Key Rust Crates

Architectural Pattern: Ports & Adapters

High-Level Flow

Key Design Decisions

Platform-Specific Integration

Windows (Priority 1)

Linux (Priority 1)

macOS (Deferred)

Project Structure

Development Commands

Initial Setup

Development

Building

Database

Platform-Specific Requirements

Database Schema

Core Tables:

Security & Privacy

Implementation Phases

Phase 1: Foundation ✅ (Start Here)

Phase 2: Audio Capture (Windows & Linux)

Phase 3: ASR Integration (AssemblyAI & Deepgram)

Phase 4: LLM Integration for Insights

Phase 5: Participant Detection (Experimental)

Phase 6: Polish & Enhancement

Known Limitations & TODOs

Troubleshooting