Skip to content

Latest commit

 

History

History
127 lines (96 loc) · 6.78 KB

File metadata and controls

127 lines (96 loc) · 6.78 KB

Roadmap-Hinweis: Vage Bullets ohne Akzeptanzkriterien in Checkbox-Tasks überführen. Format: - [ ] <Task> (Target: <Q/Jahr>).

Whisper Plugin Roadmap

Current Status

v2.1.0 — Thread-safe. MP3/OGG input via FFmpeg adapter. Benchmarks wired.

Completed ✅

  • IAudioBackend interface + THEMIS_AUDIO_PLUGIN() export macro
  • WavAudioChunkReader — RIFF/WAV parser (16-bit PCM, IEEE float32)
  • FfmpegAudioChunkReader — MP3/OGG/FLAC/M4A via ffmpeg subprocess
  • CompositeAudioChunkReader — chains multiple readers by extension
  • IWhisperTranscriber strategy interface
  • WhisperCppTranscriber (production, optional compile)
  • WhisperStubTranscriber (CI / no model file)
  • InMemoryWhisperTranscriber test double
  • WhisperPlugin — provenance stamps, error counting, DL entry points
  • WhisperConfig::fromJson / toJson with validation and clamping
  • 44 unit tests (WhisperPluginFocusedTests, groups A–N)
  • Plugin manifest (plugins/whisper/plugin.json.in)
  • CMake registration (plugin + tests)
  • WhisperConfig.language_confidence_threshold — filters low-confidence detectLanguage() results
  • Thread-safety: transcribe_mutex_ now also guards detectLanguage() + threshold filter
  • WhisperPluginAdapter + WhisperPluginRegistrarIThemisPlugin adapter wrapping WhisperPlugin; createPlugin, createAdapter, defaultReloadCallback, enableHotPlug, disableHotPlug; 12 unit tests (WhisperPluginRegistrarTests, groups A–D) (2026-04-16)

In Progress

(none)

Planned Features

  • Streaming token output during transcription (Target: Q3 2026)
  • VAD pre-filter to skip silent segments (Target: Q3 2026)
  • Speaker diarisation — multi-speaker attribution (Target: Q4 2026)
  • Language-detection confidence threshold config (Target: Q3 2026)

Implementation Phases

Phase 1 — Design / API Contract ✅

  • IAudioBackend, TranscriptionResult, WhisperConfig defined
  • Strategy interface (IWhisperTranscriber) separating backend from lifecycle

Phase 2 — Core Implementation ✅

  • WavAudioChunkReader — PCM parsing without libsndfile dependency
  • FfmpegAudioChunkReader — MP3/OGG/FLAC decoder via subprocess
  • CompositeAudioChunkReader — extension-based reader dispatch
  • WhisperPlugin wiring config → reader → transcriber → result

Phase 3 — Error Handling & Edge Cases ✅

  • WAV format validation (magic, chunk size, sample rate bounds)
  • File-not-found, empty file, truncated data → success=false + error_message
  • Transcriber exception catching in WhisperPlugin::transcribe()
  • ffmpeg not available → runtime_error("ffmpeg not available")
  • Shell-escaped path in ffmpeg subprocess (NUL-byte guard, single-quote wrapping)
  • Max-output guard (500 MB) in FfmpegAudioChunkReader

Phase 4 — Tests ✅

  • 44 unit tests across groups A–N
  • Group K: thread-safety (concurrent transcribe, atomic error/success counters, detectLanguage)
  • Group L: FfmpegAudioChunkReader canRead, graceful degradation, composite routing
  • Group O: streaming transcription — single-token fallback, multi-token, callback exception, uninit guard, provenance (WST-01..05)
  • Group P: EnergyThresholdVad — all-silence, all-speech, mixed (VAD-01..03)
  • Group Q: WhisperPlugin VAD integration — silent skip, speech pass-through, null VAD no-op (VAD-04..06)

Phase 5 — Performance / Hardening ✅

  • Thread-safety audit of WhisperPlugin for concurrent transcribe() calls
  • Benchmark wired (bench_whisper_transcription.cpp, 9 scenarios)
  • transcribeStream() with incremental token callback; callback-exception safety (Q3 2026)
  • EnergyThresholdVad + IVoiceActivityDetector strategy; WhisperPlugin::setVoiceActivityDetector() (Q3 2026)
  • Benchmark against whisper.cpp CLI on real model (Target: Q3 2026)

Phase 6 — Documentation & Acceptance ✅

  • README, CHANGELOG, ROADMAP, ARCHITECTURE, FUTURE_ENHANCEMENTS, AUDIT, SECURITY

Production Readiness Checklist

  • Unit tests present (44 tests)
  • Stub mode for CI without model file
  • Injection constructor for test doubles
  • Provenance stamps on every result
  • Thread-safety verified for concurrent access
  • Performance benchmarks wired (stub path exercised in CI)
  • PluginManager hot-plug integration (WhisperPluginAdapter / WhisperPluginRegistrar)
  • transcribeStream() — incremental token callback with exception safety (v2.2.0)
  • EnergyThresholdVad + IVoiceActivityDetector strategy injected via setVoiceActivityDetector() (v2.2.0)
  • 55 unit tests (groups A–Q, including WST-01..05 + VAD-01..06)
  • Real whisper.cpp integration validated end-to-end (requires model file)

Phase 7 — PluginManager Hot-Plug Integration ✅ (v2.1.0)

  • WhisperPluginAdapter : IThemisPlugin — wraps WhisperPlugin, implements initialize(config_json), shutdown(), getType(), getCapabilities(), getInstance(); PluginType::AUDIO_PROCESSING
  • WhisperPluginRegistrarcreatePlugin(), createAdapter(), defaultReloadCallback(), enableHotPlug(), disableHotPlug()
  • 12 unit tests (WhisperPluginRegistrarTests, groups A–D) in src/whisper/tests/test_whisper_plugin_registrar.cpp

Known Issues & Limitations

  • WhisperCppTranscriber is compiled but not exercised in CI without a model file.
  • Speaker diarisation is not implemented.
  • FfmpegAudioChunkReader requires ffmpeg on PATH; degrades gracefully when absent.

Breaking Changes

  • v2.1.0: WhisperPlugin default constructor now installs a CompositeAudioChunkReader (WAV first, then FFmpeg) instead of a bare WavAudioChunkReader. Injection-constructor callers are unaffected.

Latente Symbole (Unused-Functions-Audit)

Stand: 2026-04-20 – Quelle: src/UNUSED_FUNCTIONS_REPORT.md

🧪 NUR_TESTS (implementiert, kein Produktions-Aufrufer)

  • canRead – Prüft ob Whisper-Plugin einen Audio-Chunk lesen kann
  • addReader – Registriert einen Audio-Reader für den Whisper-Plugin-Stack
  • WhisperPlugin – Whisper-ASR-Plugin-Implementierung; Tests + Bench vorhanden

    Aktion: ROADMAP-Ticket für Produktions-Integration ergänzen oder als CANDIDATE_FOR_REMOVAL markieren.

🟡 UNGENUTZT (kein Test, kein externer Aufrufer)

  • parseWav – Parsed WAV-Header und extrahiert Audio-Rohdaten

    Aktion: Für jedes Symbol entscheiden: (1) Verdrahten, (2) Testen oder (3) als CANDIDATE_FOR_REMOVAL einplanen.