Roadmap-Hinweis: Vage Bullets ohne Akzeptanzkriterien in Checkbox-Tasks überführen. Format:
- [ ] <Task> (Target: <Q/Jahr>).
v2.1.0 — Thread-safe. MP3/OGG input via FFmpeg adapter. Benchmarks wired.
-
IAudioBackendinterface +THEMIS_AUDIO_PLUGIN()export macro -
WavAudioChunkReader— RIFF/WAV parser (16-bit PCM, IEEE float32) -
FfmpegAudioChunkReader— MP3/OGG/FLAC/M4A viaffmpegsubprocess -
CompositeAudioChunkReader— chains multiple readers by extension -
IWhisperTranscriberstrategy interface -
WhisperCppTranscriber(production, optional compile) -
WhisperStubTranscriber(CI / no model file) -
InMemoryWhisperTranscribertest double -
WhisperPlugin— provenance stamps, error counting, DL entry points -
WhisperConfig::fromJson/toJsonwith validation and clamping - 44 unit tests (
WhisperPluginFocusedTests, groups A–N) - Plugin manifest (
plugins/whisper/plugin.json.in) - CMake registration (plugin + tests)
-
WhisperConfig.language_confidence_threshold— filters low-confidencedetectLanguage()results - Thread-safety:
transcribe_mutex_now also guardsdetectLanguage()+ threshold filter -
WhisperPluginAdapter+WhisperPluginRegistrar—IThemisPluginadapter wrappingWhisperPlugin;createPlugin,createAdapter,defaultReloadCallback,enableHotPlug,disableHotPlug; 12 unit tests (WhisperPluginRegistrarTests, groups A–D) (2026-04-16)
(none)
- Streaming token output during transcription (Target: Q3 2026)
- VAD pre-filter to skip silent segments (Target: Q3 2026)
- Speaker diarisation — multi-speaker attribution (Target: Q4 2026)
- Language-detection confidence threshold config (Target: Q3 2026)
-
IAudioBackend,TranscriptionResult,WhisperConfigdefined - Strategy interface (
IWhisperTranscriber) separating backend from lifecycle
-
WavAudioChunkReader— PCM parsing without libsndfile dependency -
FfmpegAudioChunkReader— MP3/OGG/FLAC decoder via subprocess -
CompositeAudioChunkReader— extension-based reader dispatch -
WhisperPluginwiring config → reader → transcriber → result
- WAV format validation (magic, chunk size, sample rate bounds)
- File-not-found, empty file, truncated data →
success=false+error_message - Transcriber exception catching in
WhisperPlugin::transcribe() - ffmpeg not available →
runtime_error("ffmpeg not available") - Shell-escaped path in ffmpeg subprocess (NUL-byte guard, single-quote wrapping)
- Max-output guard (500 MB) in
FfmpegAudioChunkReader
- 44 unit tests across groups A–N
- Group K: thread-safety (concurrent transcribe, atomic error/success counters, detectLanguage)
- Group L: FfmpegAudioChunkReader canRead, graceful degradation, composite routing
- Group O: streaming transcription — single-token fallback, multi-token, callback exception, uninit guard, provenance (WST-01..05)
- Group P: EnergyThresholdVad — all-silence, all-speech, mixed (VAD-01..03)
- Group Q: WhisperPlugin VAD integration — silent skip, speech pass-through, null VAD no-op (VAD-04..06)
- Thread-safety audit of
WhisperPluginfor concurrenttranscribe()calls - Benchmark wired (
bench_whisper_transcription.cpp, 9 scenarios) -
transcribeStream()with incremental token callback; callback-exception safety (Q3 2026) -
EnergyThresholdVad+IVoiceActivityDetectorstrategy;WhisperPlugin::setVoiceActivityDetector()(Q3 2026) - Benchmark against whisper.cpp CLI on real model (Target: Q3 2026)
- README, CHANGELOG, ROADMAP, ARCHITECTURE, FUTURE_ENHANCEMENTS, AUDIT, SECURITY
- Unit tests present (44 tests)
- Stub mode for CI without model file
- Injection constructor for test doubles
- Provenance stamps on every result
- Thread-safety verified for concurrent access
- Performance benchmarks wired (stub path exercised in CI)
- PluginManager hot-plug integration (
WhisperPluginAdapter/WhisperPluginRegistrar) -
transcribeStream()— incremental token callback with exception safety (v2.2.0) -
EnergyThresholdVad+IVoiceActivityDetectorstrategy injected viasetVoiceActivityDetector()(v2.2.0) - 55 unit tests (groups A–Q, including WST-01..05 + VAD-01..06)
- Real whisper.cpp integration validated end-to-end (requires model file)
-
WhisperPluginAdapter : IThemisPlugin— wrapsWhisperPlugin, implementsinitialize(config_json),shutdown(),getType(),getCapabilities(),getInstance();PluginType::AUDIO_PROCESSING -
WhisperPluginRegistrar—createPlugin(),createAdapter(),defaultReloadCallback(),enableHotPlug(),disableHotPlug() - 12 unit tests (
WhisperPluginRegistrarTests, groups A–D) insrc/whisper/tests/test_whisper_plugin_registrar.cpp
WhisperCppTranscriberis compiled but not exercised in CI without a model file.- Speaker diarisation is not implemented.
FfmpegAudioChunkReaderrequiresffmpegon PATH; degrades gracefully when absent.
- v2.1.0:
WhisperPlugindefault constructor now installs aCompositeAudioChunkReader(WAV first, then FFmpeg) instead of a bareWavAudioChunkReader. Injection-constructor callers are unaffected.
Stand: 2026-04-20 – Quelle: src/UNUSED_FUNCTIONS_REPORT.md
canRead– Prüft ob Whisper-Plugin einen Audio-Chunk lesen kannaddReader– Registriert einen Audio-Reader für den Whisper-Plugin-StackWhisperPlugin– Whisper-ASR-Plugin-Implementierung; Tests + Bench vorhandenAktion: ROADMAP-Ticket für Produktions-Integration ergänzen oder als CANDIDATE_FOR_REMOVAL markieren.
parseWav– Parsed WAV-Header und extrahiert Audio-RohdatenAktion: Für jedes Symbol entscheiden: (1) Verdrahten, (2) Testen oder (3) als CANDIDATE_FOR_REMOVAL einplanen.