Skip to content

Build Unified Distilled Knowledge Graph with Human/AI Source Fusion #18

@durapensa

Description

@durapensa

Overview

Build a sophisticated knowledge fusion system that creates a deduplicated, weighted graph blending human and AI insights with transparent provenance tracking.

Architecture

Storage Layers

Raw Layer (append-only):
- events/hot.jsonl         # Human thoughts (weight: 1.0)
- derived/approved.jsonl   # AI findings (weight: varies)
- derived/rejected.jsonl   # Learning data

Fusion Layer (periodic rebuild):
- distilled/concepts.jsonl       # Unified concepts with weights
- distilled/relations.jsonl      # Connections between concepts
- distilled/provenance.jsonl     # Source tracking

Example Unified Concept

{
  "id": "evolutionary-ethics",
  "canonical_form": "Evolutionary Ethics",
  "description": "Morality emerging from evolutionary dynamics",
  "weight": 0.85,
  "sources": {
    "human": {
      "count": 5,
      "events": ["2025-06-10T17:03:28Z", "2025-06-10T17:01:36Z"],
      "weight_contribution": 0.6
    },
    "ai": {
      "count": 2,
      "findings": ["2025-06-10T18:02:26Z-pattern-analysis"],
      "weight_contribution": 0.4
    }
  },
  "confidence": 0.92,
  "last_updated": "2025-06-10T18:30:00Z"
}

Key Features

1. Dual-Write System

  • Summary in hot.jsonl for discoverability
  • Full finding in derived/approved.jsonl with complete metadata
  • Bidirectional linking via derived_ref

2. Relative Contribution Tracking

  • Human contributions weighted as ground truth (1.0)
  • AI contributions weighted by approval confidence and recency
  • Transparent source attribution showing percentage from each source

3. Deduplication Strategy

  • Concept clustering using embeddings/similarity
  • Canonical form selection (prefer human-originated terms)
  • Alias tracking for all variations

4. Rejection Learning

  • Track rejected findings in derived/rejected.jsonl
  • Include rejection reasons and improvement suggestions
  • Use for ML training and analysis improvement

Implementation Steps

Phase 1: Foundation (Immediate)

  • Update review-findings to implement dual-write
  • Add enriched metadata tracking (source events, analysis parameters)
  • Implement rejected findings storage

Phase 2: Distillation Process

  • Build concept extraction and clustering
  • Implement weight calculation algorithm
  • Create periodic distillation background job

Phase 3: Query Integration

  • Update query tools to search unified graph
  • Add provenance display options
  • Implement confidence-based filtering

Phase 4: Evolution Tracking

  • Track concept strength over time
  • Visualize knowledge graph growth
  • Identify emerging themes

Benefits

  1. Trust Indicators - Clear visibility of human vs AI contributions
  2. Feedback Loop - Rejected findings improve future analysis
  3. Clean Queries - Single deduplicated concept instead of fragments
  4. Knowledge Evolution - Watch insights strengthen and connect over time
  5. Transparent Provenance - Always know where knowledge originated

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: highCritical for project progresstype: featureNew functionality or enhancement

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions