Skip to content

Latest commit

 

History

History
309 lines (242 loc) · 10.2 KB

File metadata and controls

309 lines (242 loc) · 10.2 KB

Per-Chunk RAG Enrichment - Intelligent Knowledge Integration

Concept

Instead of using the same broad RAG context for all chunks, each continuation chunk gets its own targeted FlockParser query based on its specific focus area.

How It Works

Before (Naive RAG)

User: "Explain string theory"
   ↓
FlockParser: Query "string theory" → 1.9K tokens of general context
   ↓
Phase 1: Generate chunk with general context ✓
Phase 2: Generate chunk with SAME general context (wasteful!)
Phase 3: Generate chunk with SAME general context (wasteful!)
Phase 4: Generate chunk with SAME general context (wasteful!)
Phase 5: Generate chunk with SAME general context (wasteful!)

Problem: All chunks get identical context, wasting tokens and missing specialized knowledge.

After (Smart RAG)

User: "Explain string theory"
   ↓
Phase 1: FlockParser query "string theory" (broad)
   → General overview context (1.9K tokens)
   → Generate foundational concepts ✓

Phase 2: FlockParser query "string theory mathematical formalism"
   → Context about equations, theoretical frameworks (1.0K tokens)
   → Generate math-focused chunk ✓

Phase 3: FlockParser query "string theory experimental evidence"
   → Context about experiments, empirical studies (1.0K tokens)
   → Generate evidence-focused chunk ✓

Phase 4: FlockParser query "string theory applications"
   → Context about real-world uses, technology (1.0K tokens)
   → Generate application-focused chunk ✓

Phase 5: FlockParser query "string theory unsolved problems"
   → Context about frontiers, controversies (1.0K tokens)
   → Generate future research chunk ✓

Result: Each chunk gets specialized context relevant to its topic!

Implementation

Focus Areas by Content Type

The system automatically assigns focus areas based on content type:

RESEARCH (ContentType.RESEARCH)

  1. Chunk 1: Fundamental concepts, basic definitions, foundations
  2. Chunk 2: Mathematical formalism, equations, theoretical frameworks
  3. Chunk 3: Experimental evidence, empirical studies, research findings
  4. Chunk 4: Real-world applications, practical implementations, use cases
  5. Chunk 5: Current frontiers, unsolved problems, future research

ANALYSIS (ContentType.ANALYSIS)

  1. Overview and initial assessment
  2. Strengths, advantages, positive aspects
  3. Weaknesses, limitations, challenges
  4. Comparative analysis and alternatives
  5. Implications and conclusions

EXPLANATION (ContentType.EXPLANATION)

  1. Basic overview and introduction
  2. Step-by-step process and methodology
  3. Common pitfalls and troubleshooting
  4. Advanced techniques and best practices
  5. Practical examples and use cases

Code Flow

File: /home/joker/SynapticLlamas/distributed_orchestrator.py

Parallel Mode (lines 1236-1296)

for i in range(2, chunks_needed + 1):
    focus = focus_areas.get(i, "additional aspects")

    # Get focused RAG context for this chunk
    chunk_context = ""
    if self.use_flockparser and content_type == ContentType.RESEARCH:
        # Build focused query
        focused_query = f"{query_for_continuation} {focus}"

        # Query FlockParser with topic-specific context
        enhanced_chunk_query, chunk_docs = self.flockparser_adapter.enhance_research_query(
            focused_query,
            top_k=10,              # Fewer docs per chunk
            max_context_tokens=1000 # Smaller context per chunk
        )

        # Extract RAG context
        chunk_context = enhanced_chunk_query.replace(focused_query, "").strip()

    # Build prompt with chunk-specific context
    if chunk_context:
        continuation_prompt = f"""Research topic: {query_for_continuation}

Part {i} of {chunks_needed}. Focus SPECIFICALLY on: {focus}

📚 Relevant Knowledge Base Context:
{chunk_context}

CRITICAL REQUIREMENTS:
- Use the knowledge base context above to add specific technical details
- Include equations, data, specific examples from the context
- Be technical and specific, not vague or repetitive
"""

Sequential Mode (lines 1478-1542)

Same logic but executed sequentially instead of in parallel.

Benefits

1. Targeted Knowledge Retrieval

Each chunk gets documents relevant to its specific topic:

  • Chunk 2 (math): Retrieves papers about string theory equations
  • Chunk 3 (experiments): Retrieves papers about LHC tests, particle physics
  • Chunk 4 (applications): Retrieves papers about quantum computing, materials

2. Token Efficiency

  • Before: 1.9K tokens × 5 chunks = 9.5K tokens wasted
  • After: 1.0K tokens × 4 chunks = 4.0K tokens (47% reduction)

3. Higher Quality Output

Chunks contain:

  • ✅ Specific citations relevant to the topic
  • ✅ Technical details from specialized documents
  • ✅ Data and equations from domain-specific sources
  • ✅ No repetitive generic context

4. Better Document Coverage

  • Broad initial query finds overview documents
  • Focused queries find specialized technical documents
  • Final report covers MORE documents with BETTER relevance

Token Budget (8K Context)

Phase 1 (Broad Context)

RAG context (broad):        1.9K tokens
System prompt:              0.3K tokens
User query + instructions:  0.5K tokens
─────────────────────────────────────
Total input:                2.7K tokens
Available for output:       5.3K tokens ✅

Phase 2-5 (Focused Context)

RAG context (focused):      1.0K tokens
System prompt:              0.3K tokens
User query + instructions:  0.5K tokens
Previous chunk summary:     0.2K tokens
─────────────────────────────────────
Total input:                2.0K tokens
Available for output:       6.0K tokens ✅✅

Savings: ~0.9K tokens per chunk = 3.6K total across 4 chunks!

Example: String Theory Report

Without Per-Chunk RAG

All chunks query: "string theory"
Retrieved docs:
  • Greene, B. "The Elegant Universe" (general overview)
  • Polchinski, J. "String Theory Vol. 1" (general overview)

All chunks get SAME CONTEXT (repeated 5 times)

With Per-Chunk RAG

Chunk 1: "string theory" (broad)
  → Greene "Elegant Universe" (overview)

Chunk 2: "string theory mathematical formalism"
  → Polchinski "String Theory Vol. 1" (equations)
  → Witten "M-theory" (technical framework)

Chunk 3: "string theory experimental evidence"
  → CERN "LHC Results" (particle physics)
  → Hawking "Black Hole Thermodynamics" (observations)

Chunk 4: "string theory applications"
  → Kaku "Hyperspace" (cosmology applications)
  → Susskind "Holographic Principle" (quantum computing)

Chunk 5: "string theory unsolved problems"
  → Smolin "Trouble with Physics" (criticisms)
  → Penrose "Road to Reality" (open questions)

Each chunk cites different, specialized sources!

Configuration

Enable Feature

Feature is automatically enabled when:

  • --use-flockparser flag is set
  • FlockParser knowledge base has documents
  • Content type is RESEARCH

No additional configuration needed!

Disable Feature

To use the old behavior (broad context for all chunks):

  • Don't use --use-flockparser flag
  • OR delete FlockParser knowledge base

Logging Output

When running with per-chunk RAG, you'll see:

2025-10-14 19:30:25,272 - INFO - ⚡ PARALLEL MULTI-TURN MODE: 3 nodes available
2025-10-14 19:30:25,273 - INFO - 📝 Phase 1: Initial Content Generation
2025-10-14 19:30:25,272 - INFO - 📖 RAG Enhancement: Using 1 source document(s)
2025-10-14 19:30:25,272 - INFO -    • majorana_fermions.pdf

2025-10-14 19:32:52,263 - INFO - ⚡ Phase 2: Parallel Chunk Generation (4 chunks)
2025-10-14 19:32:52,264 - INFO -    📖 Chunk 2: Querying FlockParser for 'mathematical formalism'
2025-10-14 19:32:52,265 - INFO -       ✅ Found 3 source(s) for chunk 2
2025-10-14 19:32:52,266 - INFO -    📖 Chunk 3: Querying FlockParser for 'experimental evidence'
2025-10-14 19:32:52,267 - INFO -       ✅ Found 2 source(s) for chunk 3
2025-10-14 19:32:52,268 - INFO -    📖 Chunk 4: Querying FlockParser for 'applications'
2025-10-14 19:32:52,269 - INFO -       ✅ Found 4 source(s) for chunk 4
2025-10-14 19:32:52,270 - INFO -    📖 Chunk 5: Querying FlockParser for 'unsolved problems'
2025-10-14 19:32:52,271 - INFO -       ✅ Found 2 source(s) for chunk 5

Performance Impact

Additional Latency

  • Per-chunk FlockParser query: ~0.5-1.0 seconds
  • 4 chunks in parallel: No additional latency (queries run concurrently)
  • 4 chunks sequential: +2-4 seconds total

Benefits Outweigh Cost

  • Better quality output
  • More specialized knowledge
  • More documents cited
  • Token savings enable longer outputs

Future Enhancements

1. Citation Tracking

Track which documents were used per chunk:

result['metadata']['chunk_citations'] = {
    1: ['Greene - Elegant Universe'],
    2: ['Polchinski - String Theory', 'Witten - M-theory'],
    3: ['CERN - LHC Results'],
    ...
}

2. Adaptive Context Sizing

Adjust max_context_tokens based on chunk complexity:

if chunk_num == 2:  # Math chunk needs more equations
    max_context_tokens = 1500
else:
    max_context_tokens = 1000

3. Cross-Chunk Deduplication

Prevent retrieving the same document for multiple chunks:

used_docs = set()
for chunk in chunks:
    new_docs = [d for d in retrieved_docs if d not in used_docs]
    used_docs.update(new_docs)

4. Verification Loop

Query FlockParser after chunk generation to verify claims:

chunk_text = researcher.process(prompt)
verification = flockparser.verify_claims(chunk_text)
if verification.confidence < 0.8:
    # Retrieve evidence and regenerate

Summary

Before: One broad RAG query → repeated for all chunks (wasteful)

After: One broad query (Phase 1) + focused queries per chunk (Phase 2+)

Result:

  • ✅ 47% token savings (3.6K tokens across 4 chunks)
  • ✅ Higher quality, specialized context per chunk
  • ✅ More documents cited overall
  • ✅ Better alignment with chunk focus areas
  • ✅ Smarter use of FlockParser knowledge base

The system now intelligently leverages both SynapticLlamas and FlockParser to produce comprehensive, well-researched reports! 🚀