Skip to content

Latest commit

 

History

History
308 lines (260 loc) · 21.8 KB

File metadata and controls

308 lines (260 loc) · 21.8 KB

Roadmap-Hinweis: Vage Bullets ohne Akzeptanzkriterien in Checkbox-Tasks überführen. Format: - [ ] <Task> (Target: <Q/Jahr>).

Ethics AI Module Roadmap

Current Status

v0.3.0 — PhilosophyLoader::reloadProfiles() atomic hot-reload with mutex. EthicalDiscourseEngine::continueDebate(debate_id, round) multi-round debates (max 3 rounds; REBUTTAL/SYNTHESIS argument types; cross-round counter-argument links). ArgumentStore::storeDebateRound() + getDebateTranscript(). DebateRound struct in ethics_ai_types.h. EthicsEvaluator::recordDecision() + getMetricsText() Prometheus text v0.0.4 (5 metric families, std::atomic backed). 12 tests EAM-01..12 in tests/test_ethics_ai_v030.cpp. LLM argument generation and real embeddings remain planned for v0.1.0/v0.4.0.


Completed ✅

  • EthicsEvaluator — 5-dimension decision scoring API
  • EthicalDiscourseEngineinitializeDebate() and makeDecision()
  • RAGContextEngine — 7 AQL retrieval pattern methods
  • ArgumentStore — BaseEntity-backed persistence with standalone fallback
  • PhilosophyLoader — YAML profile loading with caching and validation
  • EthicsAiPlugin — IThemisPlugin wiring and lifecycle
  • Shared domain types (EthicalArgument, EthicalDecision, PhilosophyProfile, etc.)
  • AQL query constants for all 7 retrieval patterns
  • PhilosophyLoader::addProfile() — programmatic profile injection for unit tests (philosophy_loader.h)
  • Unit tests for all six components — 70 tests across four focused targets: DiscourseEngineFocusedTests (11), ArgumentStoreStandaloneTests (18), EthicsAiPluginTests (28), RAGContextEngineTests (13); all registered in tests/CMakeLists.txt under THEMIS_PLUGIN_ETHICS_AI guard (2026-04-08)
  • BaseEntity adapter for ethics types
  • std::variant<T, Status> error handling throughout all public APIs
  • Standalone in-memory mode for ArgumentStore (testing without RocksDB)
  • PhilosophyLoader rich YAML — complex thesis objects, point-keyed strengths/weaknesses, nested decision_framework (Issue: #4596) (2026-04-12)
  • EthicsEvaluator::Config — configurable dimension weights normalised in ctor (Issue: #4596) (2026-04-12)
  • ChainVisualizerexportDot(), exportMermaid(), chainToDot(), chainToMermaid() (Issue: #4596) (2026-04-12)
  • 8 tests CV-01…CV-08 in tests/test_ethics_ai_chain_visualizer.cpp (Issue: #4596) (2026-04-12)

In Progress [~]

  • Focused unit test suites implemented and passing: test_argument_store_standalone (18), test_rag_context_engine_focused (18), test_ethics_ai_plugin_focused (28), test_discourse_engine_focused (11), test_philosophy_loader_focused (7 passed, 1 skipped env-dependent)
  • Integration test suite implemented and passing: test_ethics_ai_integration (21) — FullPipeline, ArgumentStoreRAG, RAGContextBuild

Planned Features

v0.1.0 — LLM Argument Generation (Target: Q3 2026)

  • Integrate LLM backend for argument content generation (Target: Q3 2026)
    • Inputs: PhilosophyProfile, dilemma text, ArgumentType
    • Outputs: EthicalArgument.content with chain-of-thought rationale
    • Constraints: max 500 tokens per argument; latency ≤ 3 s per argument
    • Errors: LLM timeout → fallback to template; context window exceeded → truncate
    • Tests: unit (mock LLM) + integration (live LLM) + golden-output comparison
  • Dynamic confidence score computed from argument strength distribution (Target: Q3 2026)
    • Implemented in EthicsEvaluator::computeConfidence(): WEAK=0.25, MODERATE=0.50, STRONG=0.75, DECISIVE=1.00 weighted average
  • Dynamic consensus_level score from inter-philosophy agreement analysis (Target: Q3 2026)
    • Implemented in EthicsEvaluator::computeConsensus(): per-school PRO/CONTRA tally; fraction of agreeing schools
  • Richer argument content from generateArgument() using all profile theses and decision framework (Target: Q3 2026)
    • Strength derived from total thesis count; all main_theses and secondary_theses included; dilemma text referenced
  • Real embedding generation for vectorSemanticSearch (sentence-transformers or ONNX) (Target: Q3 2026)

v0.2.0 — Advanced RAG and Evaluation (Target: Q4 2026)

  • Philosophy profile hot-reload without server restart (Target: Q4 2026)
    • PhilosophyLoader::reloadProfiles(directory): atomically re-scans directory using a temp loader, then swaps profiles_ under mutex_; thread-safe; returns new profile count or Status::Error
  • Multi-round debates: continueDebate() with counter-argument generation (Target: Q4 2026)
    • EthicalDiscourseEngine::continueDebate(debate_id, round_number)DebateRound (round capped at 3; REBUTTAL/SYNTHESIS argument types in rounds 2/3; counter-argument IDs linked)
    • ArgumentStore::storeDebateRound(round) + getDebateTranscript(debate_id) returning rounds ordered by round_number
    • DebateRound struct added to ethics_ai_types.h
  • Configurable aggregation weights for EthicsEvaluator dimensions (Target: Q4 2026)
    • EthicsEvaluator::Config struct; weights normalised in constructor; default ctor preserves legacy behaviour
  • Prometheus metrics: decisions/sec, avg confidence, RAG hit rate (Target: Q4 2026)
    • EthicsEvaluator::recordDecision(confidence, rag_hit, latency_ms) + setArgumentStoreSize(count) + getMetricsText() emitting Prometheus text v0.0.4
    • Metrics: ethics_decisions_total, ethics_decision_latency_ms_total, ethics_rag_context_hits_total, ethics_argument_confidence_avg, ethics_argument_store_size
    • Backed by std::atomic counters (lock-free, thread-safe)
  • Performance benchmark: full decision pipeline ≤ 200 ms (excl. LLM) at p99 (Target: Q4 2026)
    • Implemented: tests/test_ethics_ai_benchmark.cpp (PB-01..PB-06); CI threshold 500 ms

v0.3.0 — Philosophy Library (Target: Q1 2027)

  • Ship built-in YAML profiles: utilitarianism, Kantian, virtue ethics, care ethics, contractualism, rationalism, others (Target: Q1 2027)
    • Profiles already in plugins/ethics_ai/philosophies/; PhilosophyLoader now handles rich YAML schema (complex thesis objects, point-keyed strengths/weaknesses, nested decision_framework)
  • Compliance ethics profiles: GDPR, ISO 42001, IEEE 7000 (Target: Q1 2027)
  • Argument chain visualisation (DOT/Mermaid export) (Target: Q1 2027)
    • ChainVisualizer::exportDot() / exportMermaid() / chainToDot() / chainToMermaid() in chain_visualizer.h/cpp; 8 tests CV-01..CV-08

Implementation Phases

Phase 1: Design / API Contract ✅

  • Define EthicalArgument, EthicalDecision, PhilosophyProfile types
  • Define ArgumentStore persistence API
  • Define EthicalDiscourseEngine orchestration API
  • Define RAGContextEngine query-pattern API

Phase 2: Core Implementation ✅

  • PhilosophyLoader YAML parsing
  • ArgumentStore BaseEntity integration + standalone mode
  • EthicalDiscourseEngine::makeDecision template argument generation
  • RAGContextEngine 7 AQL method stubs with real AQL constants

Phase 3: Error Handling & Edge Cases ✅

  • Unknown philosophy school → Status::Error
  • Empty schools list → Status::Error
  • YAML parse failure → Status::Error with file path
  • AQL/RocksDB failure propagation
  • Standalone mode activation when RocksDBWrapper is null

Phase 4: Tests [~]

  • Unit tests for PhilosophyLoader (directory, file, invalid YAML; rich YAML with complex thesis objects and nested decision_framework — Issue: #4596)
  • Unit tests for ArgumentStore standalone mode
  • Unit tests for EthicalDiscourseEngine decision flow
  • Unit tests for RAGContextEngine focused query patterns
  • Unit tests for EthicsAIPlugin lifecycle and metrics API
  • 8 tests CV-01…CV-08 for ChainVisualizer (exportDot/exportMermaid/chainToDot/chainToMermaid) — tests/test_ethics_ai_chain_visualizer.cpp (Issue: #4596, 2026-04-12)
  • Integration test: full decision pipeline end-to-end (Target: Q3 2026)
    • Scope: EthicsAIPlugin::initialize()initializeDebate()makeDecision()EthicsEvaluator::evaluate()
    • Subsystems: ethics_ai_plugin.cpp, discourse_engine.cpp, argument_store.cpp, ethics_evaluator.cpp
    • Inputs: 2 YAML philosophy profiles on disk, a MoralDilemma struct with 3 options
    • Outputs: EthicalDecision with chosen_option, confidence ∈ [0,1], consensus_level ∈ [0,1], non-empty supporting_arguments
    • Constraints: pipeline completes in ≤ 500 ms; no external LLM call required
    • Errors: missing YAML → plugin returns Status::Error; empty dilemma options → Status::Error
    • Tests: tests/test_ethics_ai_integration.cpp — GTest, direct-source compilation pattern
    • File: tests/test_ethics_ai_integration.cpp (new), added to tests/CMakeLists.txt
  • Integration test: ArgumentStore with real RocksDB (Target: Q3 2026)
    • Scope: ArgumentStore in RocksDB mode – store, load, scanPrefix, storeChain, getChain
    • Subsystems: argument_store.cpp, storage/rocksdb_wrapper.h, ethics_base_entity_adapter.h
    • Inputs: 10+ EthicalArgument entities written to a temp RocksDB directory (std::filesystem::temp_directory_path())
    • Outputs: round-trip identity (serialize → store → load → deserialize equals original); chain map reconstructed correctly
    • Constraints: temp directory cleaned up via RAII; test repeatable without leftover state
    • Errors: RocksDB open failure → Status::Error; corrupt blob → Status::Error (not crash)
    • Tests: 10 tests ASRDB-01..10 in tests/test_ethics_ai_argument_store_rocksdb.cpp (test_ethics_ai_argument_store_rocksdb_focused CMake target) — fixture using SetUpTestSuite/TearDownTestSuite for temp dir management; no data loss across shutdown/reopen cycle
  • Integration test: RAGContextEngine with live ArgumentStore data (Target: Q3 2026)
    • Scope: RAGContextEngine query methods reading from a pre-populated ArgumentStore
    • Subsystems: rag_context_engine.cpp, argument_store.cpp, AQL constants in ethics_aql_queries.h
    • Inputs: 20 seeded EthicalArgument records spanning 3 philosophy schools and 2 argument types
    • Outputs: getArgumentsByPhilosophy() returns correct subset; traverseArgumentChain() BFS produces correct ordering; getSupportingArguments() returns only SUPPORT relation type
    • Constraints: in-memory mode (no RocksDB required for this test); all assertions deterministic
    • Errors: unknown school → empty result (not crash); cycle in chain graph → terminates within max_depth hops
    • Tests: single TEST_F fixture that seeds store in SetUp; 8+ test cases covering each query-pattern method

Phase 5: Performance / Hardening [~]

  • Embedding generation integration (Target: Q3 2026)
  • LLM argument content generation (Target: Q3 2026)
  • Benchmark: decision pipeline ≤ 200 ms at p99 (excl. LLM) (Target: Q4 2026)
    • tests/test_ethics_ai_benchmark.cpp PB-01..PB-06 registered as EthicsAIBenchmarkTests
  • EthicsProfileRegistry — lazy-loading metadata index + LRU cache (Target: Q3 2026)
    • include/plugins/ethics_ai/ethics_profile_registry.h, src/ethics_ai/ethics_profile_registry.cpp
    • Scales to 1 000+ profiles; RAM: ~500 B/profile index; LRU cap: 20 warm profiles
    • Tests: EPR-01..12 in tests/test_ethics_profile_registry.cpp
  • EthicsSelectionRouter — 3-stage funnel for >100 schools (Target: Q3 2026)
    • include/plugins/ethics_ai/ethics_selection_router.h, src/ethics_ai/ethics_selection_router.cpp
    • Stage-1 tag/taxonomy (≤2 ms), Stage-2 semantic overlap (≤20 ms), Stage-3 precedent DC (≤50 ms)
    • Tests: ESR-01..10 in tests/test_ethics_selection_router.cpp
    • STUB: Stage-2 uses term-overlap proxy (real embedding model planned Q3 2026)
    • STUB: Stage-3 uses in-memory precedent map (KG graph integration planned Q4 2026)
  • Ethics Taxonomy Configuration config/ethics_ai/ethics_taxonomy.yaml (12 classes, 40+ schools)
  • New YAML profiles: behoerden_ethik.yaml, universitaere_ethik.yaml, islamische_ethik.yaml
  • New YAML profiles (canonical format): buddhistische_ethik.yaml, juedische_bioethik.yaml, konfuzianismus.yaml (Target: Q3 2026)
    • All 6 new profiles follow canonical YAML format (founders, historical_context, application_areas, famous_quotes, key_literature, section headers)
    • All 6 include routing metadata (taxonomy_class, tags, applicable_domains, convergence_compatible, regulatory_constraints, domain_overrides)
  • §9.1 Per-thesis token_budget + activation_rounds + selectThesesForRound() (Target: Q3 2026)
    • PhilosophyThesis struct in ethics_ai_types.h (thesis_id, name, description, token_budget, activation_rounds, round_role_weights)
    • PhilosophyProfile.typed_theses additive field — backward compatible
    • philosophy_loader.cpp parses typed thesis objects → typed_theses
    • ContextWindowBudgetManager::selectThesesForRound() in context_window_manager.h/.cpp
    • Tests TBM-01..10 in tests/test_thesis_budget_management.cpp

§12 Context-Window-Budget-Strategie: Komprimierung + Architekturelle Zerlegung (Target: Q3–Q4 2026)

Beide Spuren sind gleichrangig zu implementieren; weder Komprimierung noch Zerlegung allein reicht für 4+-Schul-Betrieb auf 7B-Modellen (Begründung: FUTURE_ENHANCEMENTS.md §12).

Komprimierungsspur:

  • §12.1.1 Monokel-Budget-Reduktion via activation_rounds + token_budget — ✅ implementiert (§9.1)
    • Monokel-Größe R3–R5: von ~800 Token auf ~400–500 Token komprimiert
    • Alle neuen Schulprofile MÜSSEN activation_rounds, token_budget, round_role_weights deklarieren
  • §12.1.2 PriorRoundCompressor — 3 Kompressions-Modi (Target: Q3 2026)
    • Neue Datei: include/ethics_ai/prior_round_compressor.h + src/ethics_ai/prior_round_compressor.cpp
    • Inputs: std::vector<EthicalArgument> pro Runde, CompressionConfig, current_round
    • Outputs: komprimierter String; Modus principle_citations_only (−75 % Token, ΔDC ≤ −0.05) Pflicht-Standard ab 4 Schulen
    • Tests: PRC-01..06 (→ §9.3)
  • §12.1.3 Selektive Gegner-Injektion via CrossSchoolTensionResolver (Target: Q3 2026)
    • Neue Datei: include/ethics_ai/cross_school_tension_resolver.h + src/ethics_ai/cross_school_tension_resolver.cpp
    • Selektion via rebuttal_cite_weight ≥ 0.6; sekundäre Gegner → Headline-Token (−66 % R2-Kontext)
    • Tests: CST-01..06 (→ §9.2)
  • §12.1.4 Konvergenz-Matrix via ConvergenceMarkerEngine::buildConvergencePreamble() (Target: Q3 2026)
    • Neue Datei: include/ethics_ai/convergence_marker_engine.h + src/ethics_ai/convergence_marker_engine.cpp
    • R4-Input: ~250 Token kompakte Matrix statt ~3 600 Token vollständiger Schulargumente
    • Tests: CME-01..06 (→ §9.5)

Architekturelle Zerlegungsspur:

  • §12.2.1 ILlmCascadeRouter — Modell-Routing pro Diskursrunde (Target: Q3 2026)
    • Neue Datei: include/ethics_ai/llm_cascade_router.h + src/ethics_ai/llm_cascade_router.cpp
    • Inputs: round_role, estimated_prompt_tokens; Outputs: std::shared_ptr<ILLMProvider>, ModelTokenBudget
    • Konfiguration via discourse_config.yaml::llm_cascade
    • Tests: CWB-11, CWB-12
  • §12.2.2 Sequential Tournament Mode für R3 SURREBUTTAL (Target: Q3 2026)
    • Erweiterung DiscoursePromptCoordinator::buildArgumentPrompt() für SURREBUTTAL
    • Primärer Gegner (laut CrossSchoolTensionResolver): vollständig; sekundäre: Headline
    • Token-Einsparung: −65 % R3-Gegner-Kontext bei 4 Schulen
    • Konfiguration: opponent_injection_mode: "tournament" in discourse_config.yaml
    • Tests: CWB-05
  • §12.2.3 Position-Abstract-Schema (Target: Q3 2026)
    • position_abstract field added to DiscourseRoundOutput in ethics_ai_types.h
    • EpisodicMemoryEntry struct added to ethics_ai_types.h ✅ (§12.2.4)
    • Full coordinator integration pending — see FUTURE_ENHANCEMENTS.md §12.2.3
    • Tests: CWB-06, CWB-07
  • §12.2.4 Multi-Agent-Memory-Externalisierung via ReflectionTuner::REFLEXION (Target: Q3 2026)
    • Integration in DiscoursePromptCoordinatorEpisodicMemoryEntry nach R2 schreiben
    • R3-Injektion: 3 Episoden × ≤ 50 Token = ≤ 150 Token statt ~1 600 Token Volltext
    • ReflectionTuner-Infrastruktur bereits implementiert; nur Diskurs-Brücke fehlt
    • Tests: CWB-08, CWB-09
  • §12.2.5 SynthesisMatrixBuilder — Positions-Matrix für R4 (Target: Q3 2026)
    • Neue Datei: include/ethics_ai/synthesis_matrix_builder.h + src/ethics_ai/synthesis_matrix_builder.cpp
    • Inputs: SchoolPositionSummary[] + ConvergenceMarker[]; Output: ≤ 300 Token kompakte Matrix
    • Peak-Tokens R4 mit Matrix: ~1 600 Token (4K-tauglich) statt ~3 800 Token
    • Tests: CWB-10

Budget-Profile + End-to-End-Tests:

  • config/ethics_ai/model_budget_profiles.yaml — 4 Profile (micro/standard/extended/frontier) (Target: Q3 2026)
    • micro (3B/4K): Monokel-Reduktion + headline + Positions-Matrix + REFLEXION
    • standard (7B/8K): principle_citations_only + Tournament + Position-Abstract + REFLEXION
    • extended (13B/32K): structured_summary + Cascade R4→large
    • frontier (70B+/128K): nur §12.1.1 optional
  • tests/test_context_window_budget_strategy.cpp — CWB-01..15 (Target: Q3 2026)
    • CMake-Target: test_context_window_budget_strategy_focused
    • CWB-13 (micro end-to-end 4 Schulen, Peak ≤ 4 000 Token)
    • CWB-14 (standard end-to-end 4 Schulen, Peak ≤ 8 000 Token, ΔDC ≤ 0.10)
    • CWB-15 (Backward-Kompatibilität: bestehende TBM/DRE/PRC-Tests weiterhin grün)

Note §12.2.2/§12.2.3/§12.2.4: Full DiscoursePromptCoordinator integration for Tournament Mode, Position-Abstract schema enforcement, and REFLEXION memory bridge is specified in FUTURE_ENHANCEMENTS.md §12. Implementation Target: Q3 2026.

Phase 6: Documentation & Acceptance [ ]

  • README, ARCHITECTURE, AUDIT, CHANGELOG, ROADMAP, SECURITY, FUTURE_ENHANCEMENTS
  • §12 Context-Window-Budget-Strategie: Komprimierung + Architekturelle Zerlegung dokumentiert (2026-04-29)
    • src/ethics_ai/FUTURE_ENHANCEMENTS.md §12 mit vollständiger Interface-Spezifikation, Test-Strategie, Budget-Profilen
    • ROADMAP Phase 5/6 mit CWB-Checkboxen aktualisiert
  • Philosophy profile authoring guide — inkl. activation_rounds/token_budget-Pflichtfelder (Target: Q3 2026)
  • Budget-Profil-Auswahl-Leitfaden für Operators (micro/standard/extended) (Target: Q3 2026)
  • Operator guide for production deployment (Target: Q4 2026)

Production Readiness Checklist

Area Status Notes
Core API All public methods return std::variant<T, Status>
Error handling All failure paths covered; no unhandled exceptions
Thread safety ArgumentStore mutex-protected; engine is stateless
Persistence BaseEntity + RocksDB; standalone mode for testing
Argument content ⚠️ All profile theses + decision framework used; LLM generation planned Q3 2026
Confidence scoring EthicsEvaluator::computeConfidence(): strength-weighted average
Consensus scoring EthicsEvaluator::computeConsensus(): inter-school PRO/CONTRA tally
Configurable weights EthicsEvaluator::Config; normalised; default preserves legacy behaviour
YAML profile loading Handles complex thesis objects, point-keyed strengths/weaknesses, nested frameworks
Argument chain visualisation ChainVisualizer DOT + Mermaid export
Embedding search ⚠️ BOC-TF 768-dim fallback; real ONNX model planned Q3 2026
Unit test coverage 5 focused unit suites + 1 integration suite + 1 benchmark suite + 1 visualizer suite
Performance benchmarks PB-01..PB-06 in tests/test_ethics_ai_benchmark.cpp
Multi-round debates continueDebate() max 3 rounds; REBUTTAL/SYNTHESIS types; cross-round links
Debate transcript storeDebateRound() + getDebateTranscript() ordered by round_number
Profile hot-reload reloadProfiles() atomic mutex-protected swap
Prometheus metrics recordDecision() + getMetricsText() — 5 families, std::atomic backed
Profile registry (>100 schools) EthicsProfileRegistry lazy-loading + LRU cache; EPR-01..12
School selection routing EthicsSelectionRouter 3-stage funnel; ESR-01..10

Known Issues & Limitations

  • Argument content is generated from all available profile theses and the decision framework; semantic quality depends on YAML profile authorship. LLM-based generation is planned for v0.1.0 (Q3 2026).
  • confidence and consensus_level are now computed from argument strength distribution and inter-school agreement; see EthicsEvaluator::computeConfidence/computeConsensus.
  • generateEmbedding() in RAGContextEngine uses a bag-of-characters TF model (768-dim, L2-normalised); ANN search results are lexically meaningful but not semantically rich. A real ONNX embedding model is planned for v0.1.0 (Q3 2026).
  • No built-in philosophy YAML profiles are shipped; operators must provide them.

Latente Symbole (Unused-Functions-Audit)

Stand: 2026-04-20 – Quelle: src/UNUSED_FUNCTIONS_REPORT.md

✅ Aktiv (implementiert + externer Aufrufer bestätigt)

  • EthicsAIPlugin – Plugin-Einstiegspunkt für Ethics-AI (registriert als IThemisPlugin); genutzt in plugins/ethics_ai/

🟡 UNGENUTZT (kein Test, kein externer Aufrufer)

  • strengthToScore – Konvertiert ArgumentStrength-Enum in numerischen Score [0.0–1.0]

    Aktion: Für jedes Symbol entscheiden: (1) Verdrahten, (2) Testen oder (3) als CANDIDATE_FOR_REMOVAL einplanen.