-
-
Notifications
You must be signed in to change notification settings - Fork 9
Expand file tree
/
Copy pathAUDIO_PIPELINE_DIAGRAM.txt
More file actions
109 lines (103 loc) · 8.01 KB
/
AUDIO_PIPELINE_DIAGRAM.txt
File metadata and controls
109 lines (103 loc) · 8.01 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
╔══════════════════════════════════════════════════════════════════════════╗
║ SPEAK-TO-AI AUDIO WORKFLOW ║
╚══════════════════════════════════════════════════════════════════════════╝
┌─────────────────────────────────────────────────────────────────────────┐
│ CONFIG: ~/.config/speak-to-ai/config.yaml │
│ audio: │
│ recording_method: "arecord" ← DEFAULT (can switch to "ffmpeg") │
│ device: "default" │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ AUDIO RECORDER FACTORY │
│ [audio/factory/factory.go] │
│ ├─ CreateRecorder() │
│ └─ CreateRecorderWithFallback() ← Tests and falls back if needed │
└─────────────────────────────────────────────────────────────────────────┘
│ │
┌───────────┴───────────┐ │
▼ ▼ │
┏━━━━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━┓ │
┃ ARECORD ┃ ┃ FFMPEG ┃ │
┃ (ALSA Direct) ┃ ┃ (PulseAudio) ┃ │
┃ ✅ DEFAULT ┃ ┃ ✅ ALSO WORKS ┃ │
┗━━━━━━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━━━━━┛ │
│ │ │
└───────────┬───────────┘ │
▼ │
┌────────────────────────┐ │
│ BaseRecorder │◄──────────────────┘
│ [audio/recorders/base_recorder.go] │
│ ExecuteRecordingCommand() │
└────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ SYSTEM AUDIO STACK │
│ │
│ PATH 1: arecord PATH 2: ffmpeg │
│ ═════════════════ ═══════════════ │
│ │
│ arecord cmd ffmpeg -f pulse │
│ ↓ ↓ │
│ ALSA kernel driver PulseAudio compat │
│ ↓ ↓ │
│ Hardware (ALC294) PipeWire │
│ ↓ │
│ ALSA kernel │
│ ↓ │
│ Hardware (ALC294) │
│ │
│ Latency: LOW ✅ Latency: MEDIUM ⚠️ │
│ Complexity: SIMPLE ✅ Complexity: HIGHER ⚠️ │
└─────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ TempFileManager │
│ [audio/processing/tempfile_manager.go]│
│ TempAudioPath/audio_*.wav │
└───────────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ WhisperEngine │
│ [whisper/engine.go] │
│ Speech → Text │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ OutputManager │
│ [output/outputters/] │
│ → Clipboard OR Active Window │
└─────────────────────────────────┘
════════════════════════════════════════════════════════════════════════════
USE ARECORD (ALSA):
When to use:
├─ Need lowest latency (~5ms)
├─ Simple voice dictation workflow
├─ Concurrent audio: Using 'default' device on modern Linux (PipeWire/PulseAudio)
│ Example: arecord -D default → routes through PipeWire
├─ Exclusive access: Only with hardware devices (hw:X,Y)
Example: arecord -D hw:0,0 → blocks other apps
____________________________________________________________________________
USE FFMPEG (PipeWire/PulseAudio):
When to use:
├─ Recording during video calls (Google Meet, Zoom, Discord)
├─ Multiple apps need microphone simultaneously
└─ Need robust device resolution
Capabilities:
├─ Concurrent audio: Always (via PulseAudio API)
├─ Auto-resolves 'default' → actual source name
├─ Warm-up mechanism: Prevents clipped audio starts
└─ Adaptive flush: Handles short recordings reliably
Trade-offs:
└─ Higher latency (~20-30ms), requires PipeWire/PulseAudio running
____________________________________________________________________________
CURRENT SETUP: Best of both worlds
- Default: arecord (fast, simple, concurrent on modern systems)
- Manual switch: via system tray menu or config
- Auto-fallback: switches if method fails (device busy, permission denied, etc)
════════════════════════════════════════════════════════════════════════════