-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
priority: highCritical for project progressCritical for project progresstype: featureNew functionality or enhancementNew functionality or enhancement
Milestone
Description
Overview
Build a sophisticated knowledge fusion system that creates a deduplicated, weighted graph blending human and AI insights with transparent provenance tracking.
Architecture
Storage Layers
Raw Layer (append-only):
- events/hot.jsonl # Human thoughts (weight: 1.0)
- derived/approved.jsonl # AI findings (weight: varies)
- derived/rejected.jsonl # Learning data
Fusion Layer (periodic rebuild):
- distilled/concepts.jsonl # Unified concepts with weights
- distilled/relations.jsonl # Connections between concepts
- distilled/provenance.jsonl # Source tracking
Example Unified Concept
{
"id": "evolutionary-ethics",
"canonical_form": "Evolutionary Ethics",
"description": "Morality emerging from evolutionary dynamics",
"weight": 0.85,
"sources": {
"human": {
"count": 5,
"events": ["2025-06-10T17:03:28Z", "2025-06-10T17:01:36Z"],
"weight_contribution": 0.6
},
"ai": {
"count": 2,
"findings": ["2025-06-10T18:02:26Z-pattern-analysis"],
"weight_contribution": 0.4
}
},
"confidence": 0.92,
"last_updated": "2025-06-10T18:30:00Z"
}
Key Features
1. Dual-Write System
- Summary in hot.jsonl for discoverability
- Full finding in derived/approved.jsonl with complete metadata
- Bidirectional linking via derived_ref
2. Relative Contribution Tracking
- Human contributions weighted as ground truth (1.0)
- AI contributions weighted by approval confidence and recency
- Transparent source attribution showing percentage from each source
3. Deduplication Strategy
- Concept clustering using embeddings/similarity
- Canonical form selection (prefer human-originated terms)
- Alias tracking for all variations
4. Rejection Learning
- Track rejected findings in derived/rejected.jsonl
- Include rejection reasons and improvement suggestions
- Use for ML training and analysis improvement
Implementation Steps
Phase 1: Foundation (Immediate)
- Update review-findings to implement dual-write
- Add enriched metadata tracking (source events, analysis parameters)
- Implement rejected findings storage
Phase 2: Distillation Process
- Build concept extraction and clustering
- Implement weight calculation algorithm
- Create periodic distillation background job
Phase 3: Query Integration
- Update query tools to search unified graph
- Add provenance display options
- Implement confidence-based filtering
Phase 4: Evolution Tracking
- Track concept strength over time
- Visualize knowledge graph growth
- Identify emerging themes
Benefits
- Trust Indicators - Clear visibility of human vs AI contributions
- Feedback Loop - Rejected findings improve future analysis
- Clean Queries - Single deduplicated concept instead of fragments
- Knowledge Evolution - Watch insights strengthen and connect over time
- Transparent Provenance - Always know where knowledge originated
Related Issues
- Depends on completion of Fix review-findings jq errors and reorganize workflow tools #17 (review-findings fixes)
- Enhances Build derived knowledge persistence pipeline #10 (derived knowledge pipeline)
- Supports long-term knowledge curation goals
Metadata
Metadata
Assignees
Labels
priority: highCritical for project progressCritical for project progresstype: featureNew functionality or enhancementNew functionality or enhancement