Roadmap-Hinweis: Vage Bullets ohne Akzeptanzkriterien in Checkbox-Tasks überführen. Format: - [ ] <Task> (Target: <Q/Jahr>).

Voice Module Roadmap

Current Status

v1.1.0 – Production-ready voice assistant system. VoiceAssistant orchestrator with Whisper-based STT, llama.cpp TTS/LLM integration, session management, phone call transcription, meeting protocol generation, real-time browser WebSocket streaming, voice biometric authentication, and telephony bridge (SIP/WebRTC) are all implemented.

Completed ✅

In Progress 🚧

(none)

Planned Features 📋

Long-term (6-12 months)

Federated learning for on-device voice model personalisation (Target: Q3 2026)
GPU-accelerated noise suppression and codec processing (Target: Q4 2026)

Implementation Phases

Phase 1: Voice Pipeline & Session Management (Status: Completed ✅)

Phase 2: Streaming STT & Wake-Word Detection (Status: Completed ✅)

Real-time streaming STT (word-by-word transcription as audio arrives)
Wake-word detection for hands-free activation
Multi-speaker diarization improvements

Phase 3: Voice Macros & Browser Streaming (Status: Completed ✅)

Voice command macros (user-defined shortcuts to AQL queries)
Language detection and automatic locale switching
Noise suppression preprocessing (RNNoise integration)
WebSocket audio streaming endpoint for browser clients (Issue: #2350)
Voice session playback and search in stored transcripts

Phase 4: Multi-Language TTS & Biometric Authentication (Status: Completed ✅)

Multi-language TTS (German, French, Spanish voices)
Emotion / sentiment detection from voice tone
Voice biometric authentication (speaker verification)
Real-time meeting transcription with action-item extraction (Target: Q1 2026)
Integration with telephony systems (SIP / WebRTC) (Issue: #2495)

Production Readiness Checklist

Unit tests coverage > 80% (Issue: #2355) — test_voice_assistant.cpp, test_voice_coverage.cpp, test_voice_production.cpp (496+ tests); focused targets: VoiceProductionFocusedTests, VoiceCoverageFocusedTests
Integration tests (full pipeline: audio in → transcription → AQL → audio out) (Issue: #2356) — VoiceProductionFocusedTests
Performance benchmarks (STT latency, TTS generation speed) (Issue: #2357) — benchmarks/bench_voice_assistant.cpp
[I] Security audit (audio data storage, transcription PII handling) (Issue: #2358)
[I] Documentation complete (Issue: #2359)
API stability guaranteed (Issue: #2360) — VoiceAssistant session API stable from v1.x; new v1.1.0 APIs (telephony, biometric, browser streaming) marked stable
Standalone focused test targets registered in tests/CMakeLists.txt: VoiceProductionFocusedTests, VoiceCoverageFocusedTests, VoiceAssistantFocusedTests (LLM-gated), VoiceBrowserStreamingFocusedTests, VoiceTelephonyFocusedTests
CI workflow registered — .github/workflows/voice-module-ci.yml (VoiceProductionFocusedTests, VoiceCoverageFocusedTests, VoiceBrowserStreamingFocusedTests, VoiceTelephonyFocusedTests)

Known Issues & Limitations

Streaming STT operates in sliding-window mode (3 s window, 1 s step); true sample-by-sample streaming requires Whisper.cpp THEMIS_ENABLE_WHISPER build flag.
Wake-word detection uses energy-based VAD gating and acoustic feature scoring (density, spectral centroid, crest factor). A neural wake-word model backend (e.g. Porcupine, openWakeWord) can be plugged in via WakeWordDetector::scorePhrase() without API changes.
Multi-speaker diarization uses k-means++ clustering on sub-band acoustic features (RMS + ZCR). Accuracy degrades with more than 4 simultaneous speakers; a neural embedding backend (e.g., pyannote-style x-vector) can be substituted via diarizeSegments() without API changes.
TTS voice quality depends on the llama.cpp model in use.
Voice biometric authentication uses acoustic sub-band features (no external model required). A neural i-vector/x-vector backend can be plugged in via VoiceBiometricAuthenticator's internal extractFeatures() without changing the public API. Liveness detection is heuristic-based (crest factor, spectral flatness, ZCR variability); a neural anti-spoofing model is recommended for production.

Breaking Changes

VoiceAssistant session API is stable from v1.x.
Audio format configuration (sample rate, encoding) may gain new options in v1.5.0; backward-compatible.

Latente Symbole (Unused-Functions-Audit)

Stand: 2026-04-20 – Quelle: src/UNUSED_FUNCTIONS_REPORT.md

🧪 NUR_TESTS (implementiert, kein Produktions-Aufrufer)

NoiseSuppressor – RNNoise-basierte Rauschunterdrückung; nur im Voice-Produktionstest geprüft

Aktion: ROADMAP-Ticket für Produktions-Integration ergänzen oder als CANDIDATE_FOR_REMOVAL markieren.

🟡 UNGENUTZT (kein Test, kein externer Aufrufer)

processRNNoiseFrames – Verarbeitet Audio-Frames durch RNNoise-Modell
applyRNNoiseSuppression – Wendet RNNoise auf gesamten Audio-Buffer an

Aktion: Für jedes Symbol entscheiden: (1) Verdrahten, (2) Testen oder (3) als CANDIDATE_FOR_REMOVAL einplanen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Module Roadmap

Current Status

Completed ✅

In Progress 🚧

Planned Features 📋

Long-term (6-12 months)

Implementation Phases

Phase 1: Voice Pipeline & Session Management (Status: Completed ✅)

Phase 2: Streaming STT & Wake-Word Detection (Status: Completed ✅)

Phase 3: Voice Macros & Browser Streaming (Status: Completed ✅)

Phase 4: Multi-Language TTS & Biometric Authentication (Status: Completed ✅)

Production Readiness Checklist

Known Issues & Limitations

Breaking Changes

Latente Symbole (Unused-Functions-Audit)

🧪 NUR_TESTS (implementiert, kein Produktions-Aufrufer)

🟡 UNGENUTZT (kein Test, kein externer Aufrufer)

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Voice Module Roadmap

Current Status

Completed ✅

In Progress 🚧

Planned Features 📋

Long-term (6-12 months)

Implementation Phases

Phase 1: Voice Pipeline & Session Management (Status: Completed ✅)

Phase 2: Streaming STT & Wake-Word Detection (Status: Completed ✅)

Phase 3: Voice Macros & Browser Streaming (Status: Completed ✅)

Phase 4: Multi-Language TTS & Biometric Authentication (Status: Completed ✅)

Production Readiness Checklist

Known Issues & Limitations

Breaking Changes

Latente Symbole (Unused-Functions-Audit)

🧪 NUR_TESTS (implementiert, kein Produktions-Aufrufer)

🟡 UNGENUTZT (kein Test, kein externer Aufrufer)