Roadmap-Hinweis: Vage Bullets ohne Akzeptanzkriterien in Checkbox-Tasks überführen. Format:
- [ ] <Task> (Target: <Q/Jahr>).
-
IngestionToolbox— system-wide injectable service withWorkflowEngine,StepRegistry,ITextGenerationBackend;createDefault()factory;extractEntities()+extractEntitySet()+getMetricsText()convenience API -
ToolboxBuilder— fluent builder:withWorkflowProfile,withTextBackend,withGraphWriter,withFormatExtractor,withFormatExtractorFactory,build() -
ContentToolboxBridge— unified ingest entry-point:ingest()+enrichExisting();BridgeResultstruct;vectorspopulated fromBaseEntitySet::chunks -
ToolboxRegistry— process-global registry + free functions (initializeToolbox,globalToolbox,extractEntities,extractEntitySet,getMetricsText) — persists inthemis::toolboxnamespace, accessible to all modules
-
IngestionToolboxcore API (ingestion_toolbox.h/.cpp) (v0.1.0) -
ToolboxBuilderfluent API (toolbox_builder.h/.cpp) (v0.1.0) -
ContentToolboxBridgewithBridgeResult(content_toolbox_bridge.h/.cpp) (v0.1.0) - pimpl pattern: all classes use
Impl/class Implfor ABI stability -
ToolboxRegistry+ free functions — global persistence inthemis::toolboxnamespace (v0.2.0)
-
PrometheusIngestionToolboxMetrics— concrete metrics backend (Target: Q3 2026) -
BridgeResult::vectorspopulation fromContentManager(Target: Q3 2026) -
ToolboxRegistry— process-global registry + free functions for all ThemisDB modules (Target: Q2 2026)
-
ToolboxBuilder::buildWithBridges()— returnsBuiltToolboxwith auto-wired AQL/RAG bridges (v1.9.0) -
extractEntitiesStream()— chunked streaming enrichment API (Target: Q4 2026) -
ToolboxComposite+ToolboxCompositeBuilder— MIME-routing composite toolbox for multi-format pipelines (Target: Q4 2026)
-
TextChunker— token-based chunking façade overrag::DocumentSplitter; free functionchunkText() -
TextNormalizer— umlaut/Unicode normalisation façade overutils::Normalizer; free functionnormalizeText() -
ContentFingerprinter+ContentFingerprintstruct — SHA-256 dedup contract; free functionfingerprint() -
TextQualityScorer+TextQualityScorestruct — quality gate (token_count, char_count, language, is_empty, has_boilerplate); free functionscoreText() -
LanguageDetectorinterface +DefaultLanguageDetector— stopword-heuristic ISO 639-1 detection; free functiondetectLanguage()
- Define
IngestionToolbox,ToolboxBuilder,ContentToolboxBridgepublic APIs - Design
ToolboxRegistry— controlled global withinitialize()/instance()/reset()+ free functions
-
IngestionToolbox::extractEntities()viaWorkflowEngine::execute() -
ToolboxBuilder::build()with profile loading, backend injection -
ContentToolboxBridge::ingest()+enrichExisting() -
ToolboxRegistry::initialize()/instance()/isInitialized()/reset()+ free functions (toolbox_registry.cpp)
- Null-backend guard (reinstates
NullTextGenerationBackend) -
build()throwsstd::logic_erroron double-call -
ingest()propagatesContentManagererrors viaBridgeResult::error
- Unit tests for
IngestionToolbox::extractEntities()(Target: Q3 2026) — IT-09/IT-10 intests/test_toolbox_ingestion.cpp - Integration tests for
ContentToolboxBridge::ingest()(Target: Q3 2026) — CTB-01..CTB-05 intests/test_content_toolbox_bridge.cpp(FE-01..03, TB-01..12, CTB-01..05, FM-01..08)
- Add
PrometheusIngestionToolboxMetricsfor production observability (Target: Q3 2026) →IngestionToolbox::recordExtraction()+getMetricsText()(4 families: calls/errors/entities/latency,std::atomic); auto-recorded insideextractEntities()/extractEntitySet(); tests ITM-01..06 intests/test_toolbox_phase5.cpp - Populate
BridgeResult::vectorsfromContentManager::getVectorRecords()(Target: Q3 2026) →IngestionToolbox::extractEntitySet()returns fullBaseEntitySetincludingchunks;ContentToolboxBridge::ingest()+enrichExisting()now populateBridgeResult::vectorsfromentity_set.chunks; tests VEC-01..03 intests/test_toolbox_phase5.cpp
- Update include-level docs once
buildWithBridges()is implemented (v1.9.0) - Add ROADMAP entries + test coverage for all v0.3.0 primitives (Target: Q2 2026)
-
IngestionToolbox,ToolboxBuilder,ContentToolboxBridgeimplemented and headers documented -
ToolboxRegistry+ free functions — global persistence inthemis::toolbox; dual access (global + injected) documented - Unit and integration test coverage confirmed —
test_toolbox_ingestion.cpp(IT-01..LH-03) +test_content_toolbox_bridge.cpp(FE-01..FM-08) +test_toolbox_phase5.cpp(ITM-01..06, VEC-01..03) - Prometheus metrics for production observability —
getMetricsText()onIngestionToolbox+ via free function -
BridgeResult::vectorsfully populated — viaextractEntitySet()returningBaseEntitySet::chunks -
TextChunker+TextNormalizer— text processing primitives, free functions, tests TXC-01..06 + TXN-01..04 -
ContentFingerprinter— SHA-256 dedup contract, free function, tests CFP-01..08 -
TextQualityScorer— quality gate before NER, free function, tests TQS-01..08 -
LanguageDetector— ISO 639-1 detection, interface + default impl, free function, tests LDT-01..06 -
ToolboxComposite+ToolboxCompositeBuilder— MIME routing, tests CMP-01..06 -
extractEntitiesStream()— callback-based streaming extraction, tests TCS-01..04
ContentToolboxBridge::BridgeResult::vectorsis populated fromBaseEntitySet::chunks(the embedding pipeline); chunks are only non-empty when a realIEmbeddingBackendis wired in viabuiltin.chunk_embed.
- Keine bekannten Breaking Changes dokumentiert.
Stand: 2026-04-20 – Quelle: src/UNUSED_FUNCTIONS_REPORT.md
IngestionToolbox– Haupt-Toolbox für Ingestion-Pipelines; genutzt in RAG- und AQL-Bridge
enrichExisting– Reichert existierende Entitäten mit zusätzlichen Extraktionen ancontentManager– Gibt den ContentManager aus der ContentToolboxBridge zurückAktion: Für jedes Symbol entscheiden: (1) Verdrahten, (2) Testen oder (3) als CANDIDATE_FOR_REMOVAL einplanen.