|
| 1 | +# EchoLocate Technical Features |
| 2 | + |
| 3 | +This document describes EchoLocate from an implementation perspective: what each feature does, how it works under the hood, and why it matters for real-time accessibility. |
| 4 | + |
| 5 | +## 1. Runtime Architecture |
| 6 | + |
| 7 | +EchoLocate is a browser-only application deployed as static files. |
| 8 | + |
| 9 | +Core runtime files: |
| 10 | + |
| 11 | +1. `index.html` — semantic UI structure, controls, accessibility landmarks |
| 12 | +2. `style.css` — adaptive layout, dark/light theming, responsive behavior, confidence visuals |
| 13 | +3. `app.js` — speech recognition, audio analysis, speaker lane logic, persistence, export |
| 14 | +4. `sw.js` — local fragment rendering API for HTMX (`/api/add-card`, `/api/add-chat-msg`) |
| 15 | + |
| 16 | +Design principle: |
| 17 | + |
| 18 | +- No backend is required for standard operation. |
| 19 | +- Transcript and analysis remain in-browser. |
| 20 | + |
| 21 | +## 2. Caption Pipeline |
| 22 | + |
| 23 | +Speech recognition path: |
| 24 | + |
| 25 | +1. Browser mic capture via `getUserMedia` |
| 26 | +2. `SpeechRecognition` / `webkitSpeechRecognition` receives audio stream |
| 27 | +3. Interim and final transcript events are produced |
| 28 | +4. Final transcript is combined with speaker profile metadata |
| 29 | +5. Card payload is posted to local route intercepted by Service Worker |
| 30 | +6. Returned HTML fragment is inserted into lane/chat containers |
| 31 | + |
| 32 | +Reliability behavior: |
| 33 | + |
| 34 | +- A watchdog timer restarts recognition when no results are received for a configured interval. |
| 35 | +- Warm restart logic handles unexpected `onend` while the app is still running. |
| 36 | + |
| 37 | +## 3. Audio Feature Extraction |
| 38 | + |
| 39 | +Audio analysis path uses Web Audio API + Meyda. |
| 40 | + |
| 41 | +Per-frame extracted features include: |
| 42 | + |
| 43 | +1. MFCC (13 coefficients) |
| 44 | +2. Spectral flatness |
| 45 | +3. Spectral slope |
| 46 | +4. Spectral centroid |
| 47 | +5. Spectral rolloff |
| 48 | +6. Zero crossing rate (ZCR) |
| 49 | +7. RMS energy |
| 50 | + |
| 51 | +Feature vectors are collected during utterance windows and aggregated to represent voice texture rather than a single pitch scalar. |
| 52 | + |
| 53 | +## 4. Voice Fingerprint + Lane Matching |
| 54 | + |
| 55 | +### 4.1 Vector fingerprint |
| 56 | + |
| 57 | +For each utterance, EchoLocate constructs a voice fingerprint vector from timbral and spectral features. |
| 58 | + |
| 59 | +### 4.2 Similarity scoring |
| 60 | + |
| 61 | +Each active lane profile is compared with the incoming fingerprint using cosine similarity: |
| 62 | + |
| 63 | +$$ |
| 64 | +\text{similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|} |
| 65 | +$$ |
| 66 | + |
| 67 | +### 4.3 Assignment behavior |
| 68 | + |
| 69 | +1. If similarity clears threshold, assign utterance to the best matching lane |
| 70 | +2. If not, create a new guest lane (up to max speaker limit) |
| 71 | +3. Lane profiles update incrementally with each new utterance |
| 72 | + |
| 73 | +Why this matters: |
| 74 | + |
| 75 | +- Timbral vectors are more robust than pitch-only matching when a speaker changes intonation. |
| 76 | + |
| 77 | +## 5. Anti-Flicker Stability |
| 78 | + |
| 79 | +To reduce lane hopping: |
| 80 | + |
| 81 | +1. Hysteresis lock: keep recent lane preference for a minimum duration unless a meaningfully better candidate appears |
| 82 | +2. Temporal smoothing: evaluate recent match history (median/majority over last N decisions) |
| 83 | + |
| 84 | +Result: |
| 85 | + |
| 86 | +- Better sentence continuity and fewer rapid lane switches. |
| 87 | + |
| 88 | +## 6. Language Handling |
| 89 | + |
| 90 | +Language features include: |
| 91 | + |
| 92 | +1. Selectable recognition language list, including `None (Auto)` mode |
| 93 | +2. Optional text-based language detection fallback using `franc-min` |
| 94 | +3. Visual feedback when detected text language and selected recognition language diverge |
| 95 | + |
| 96 | +Purpose: |
| 97 | + |
| 98 | +- Make multilingual conversation behavior visible and debuggable for users. |
| 99 | + |
| 100 | +## 7. Accessibility-Centered UI Features |
| 101 | + |
| 102 | +### 7.1 Lane and chat views |
| 103 | + |
| 104 | +- Lanes view: parallel speaker columns for at-a-glance differentiation |
| 105 | +- Chat view: single stream useful on mobile and constrained screens |
| 106 | + |
| 107 | +### 7.2 Active lane indicator |
| 108 | + |
| 109 | +- Current lane receives an energy ring / active state styling to show where focus is landing. |
| 110 | + |
| 111 | +### 7.3 Confidence visibility |
| 112 | + |
| 113 | +- Confidence meter (0-100%) is attached to transcript cards/messages |
| 114 | +- Low-confidence text is visually marked |
| 115 | + |
| 116 | +### 7.4 Human-in-the-loop correction |
| 117 | + |
| 118 | +- Merge controls allow users to merge two speaker lanes when automatic grouping splits one person into multiple lanes. |
| 119 | + |
| 120 | +## 8. Service Worker Local Rendering API |
| 121 | + |
| 122 | +`sw.js` behaves as a local fragment server: |
| 123 | + |
| 124 | +1. `/api/add-card` (POST): returns lane card fragment |
| 125 | +2. `/api/add-chat-msg` (POST): returns chat message fragment |
| 126 | +3. `/api/clear` (POST): local clear acknowledgment |
| 127 | + |
| 128 | +Security behavior: |
| 129 | + |
| 130 | +- Inputs are escaped/sanitized before HTML output |
| 131 | +- Attribute-safe escaping is applied for user-provided content |
| 132 | + |
| 133 | +Operational advantages: |
| 134 | + |
| 135 | +- No remote templating required |
| 136 | +- HTMX interactions stay local |
| 137 | +- Offline resilience with cache fallback for same-origin assets |
| 138 | + |
| 139 | +## 9. Persistence Model |
| 140 | + |
| 141 | +Storage: |
| 142 | + |
| 143 | +- Session cards are stored in `localStorage` (`echolocate_v1`) |
| 144 | +- Startup restore rebuilds lanes and chat view from stored cards |
| 145 | +- Clear operation wipes stored conversation state |
| 146 | + |
| 147 | +Tradeoff: |
| 148 | + |
| 149 | +- Fast and private, but scoped to browser/device profile. |
| 150 | + |
| 151 | +## 10. Export Model (VTT) |
| 152 | + |
| 153 | +Exported transcript format: |
| 154 | + |
| 155 | +- WebVTT with speaker metadata tags, e.g. `<v Speaker 1>...</v>` |
| 156 | +- Time windows are normalized relative to first utterance |
| 157 | + |
| 158 | +Benefit: |
| 159 | + |
| 160 | +- Better interoperability with subtitle tools and downstream review workflows. |
| 161 | + |
| 162 | +## 11. Privacy and Offline Operation |
| 163 | + |
| 164 | +Privacy posture: |
| 165 | + |
| 166 | +1. Audio remains local in browser |
| 167 | +2. Transcript content is not sent to external cloud APIs by default |
| 168 | +3. Processing and rendering happen on device |
| 169 | + |
| 170 | +Offline posture: |
| 171 | + |
| 172 | +1. Key dependencies are vendored in `vendor/` |
| 173 | +2. Local server (`server.py`) supports localhost operation |
| 174 | +3. Service Worker provides local route handling and cache fallback |
| 175 | + |
| 176 | +## 12. Dependency and Platform Constraints |
| 177 | + |
| 178 | +Required browser capabilities: |
| 179 | + |
| 180 | +1. Web Speech API (best support: Chromium-based desktop browsers) |
| 181 | +2. Web Audio API |
| 182 | +3. Service Worker support |
| 183 | + |
| 184 | +Known constraint: |
| 185 | + |
| 186 | +- Browsers without Web Speech API support cannot provide live transcription in this architecture. |
| 187 | + |
| 188 | +## 13. Why This Stack Is Useful for Accessibility |
| 189 | + |
| 190 | +EchoLocate is optimized for practical meeting use where reliability and transparency matter: |
| 191 | + |
| 192 | +1. Voice texture matching improves speaker grouping stability |
| 193 | +2. Watchdog recovery reduces silent transcript dropouts |
| 194 | +3. Confidence and mismatch indicators expose uncertainty instead of hiding it |
| 195 | +4. Merge controls let users correct AI mistakes quickly |
| 196 | +5. Fully local execution supports privacy-sensitive environments |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +For implementation details, see: |
| 201 | + |
| 202 | +- [README.md](README.md) |
| 203 | +- [INSTALL.txt](INSTALL.txt) |
| 204 | +- [AGENTS.md](AGENTS.md) |
0 commit comments