Skip to content

Latest commit

 

History

History
293 lines (226 loc) · 8.09 KB

File metadata and controls

293 lines (226 loc) · 8.09 KB

Opportunistic Background Indexing

Automatically fills file annotations and knowledge graph during idle time

Overview

IdlerGear now includes an opportunistic indexing system that automatically:

  • Annotates files with descriptions, tags, and components
  • Populates the knowledge graph with commits, symbols, and relationships
  • Runs in small batches (5 items) after each MCP tool completes
  • Never blocks or runs in parallel with active work

How It Works

User calls tool → Tool executes → Tool returns → Index 5 items → Return to idle

Key Features:

  • Automatic: No manual triggering needed
  • Opportunistic: Only runs when system is idle (after tool completion)
  • Small Batches: Processes 5 items at a time (fast, non-blocking)
  • Smart Priority: Files > Commits > Symbols (most useful first)
  • Progress Tracking: Always knows what's left to index

Current Status

Check what needs indexing:

from idlergear.indexing import get_indexing_status

status = get_indexing_status()
print(f"{status['file_annotations']['unannotated']} files need annotation")
print(f"{status['knowledge_graph']['commits']['remaining']} commits need indexing")
print(f"{status['summary']['estimated_batches_remaining']} batches remaining")

Example Output:

{
  "file_annotations": {
    "total_files": 142,
    "annotated": 4,
    "unannotated": 138,
    "percent_complete": 2.8
  },
  "knowledge_graph": {
    "commits": {
      "indexed": 0,
      "estimated_total": 100,
      "remaining": 100,
      "percent_complete": 0.0
    },
    "symbols": {
      "indexed": 0,
      "estimated_total": 2840,
      "remaining": 2840,
      "percent_complete": 0.0
    }
  },
  "summary": {
    "total_work_remaining": 238,
    "estimated_batches_remaining": 47
  }
}

MCP Tools

Check Status

idlergear_indexing_status()
# Returns: Full indexing status with progress percentages

Manual Batch

idlergear_index_batch(batch_size=5, target="auto")
# Processes 5 items (files, commits, or symbols)
# target: "auto", "files", "commits", "symbols"

Pause/Resume

idlergear_pause_indexing()   # Pause during performance-critical operations
idlergear_resume_indexing()  # Resume automatic indexing

Indexing Priority

When target="auto", the system prioritizes:

  1. File Annotations (fastest, 93% token savings)

    • Extracts docstrings, classes, functions from Python files
    • Auto-generates descriptions and tags
    • Enables semantic file search
  2. Git Commits (historical context)

    • Indexes commit messages, authors, timestamps
    • Links commits to files changed
    • Enables "what changed when" queries
  3. Code Symbols (deep code understanding)

    • Indexes functions, classes, methods
    • Links symbols to files
    • Enables "find function by name" queries

File Annotation Generation

Auto-generated from file contents:

# For src/idlergear/graph/queries.py:
{
  "description": "Common query patterns for IdlerGear knowledge graph",
  "tags": ["graph", "query"],
  "components": ["query_task_context", "query_file_context", "query_symbols_by_name"],
  "related_files": []
}

Extraction Logic:

  • Description: From module docstring or filename
  • Tags: From file path (test, api, graph, mcp, etc.)
  • Components: Public functions and classes (excludes _private)

Performance

Batch Processing:

  • File annotation: ~50ms per file (parse + extract + save)
  • Commit indexing: ~100ms per commit (parse + relationships)
  • Symbol indexing: ~200ms per file (AST parsing + graph insert)

Total for 5 items: ~500ms (half a second, imperceptible)

Non-Blocking:

  • Runs in finally block after tool completion
  • Uses global flag to prevent recursive indexing
  • Failures don't break normal tool operations

State Tracking

State saved to .idlergear/indexing_state.json:

{
  "paused": false,
  "last_file_index": 25,
  "last_commit_index": 15,
  "last_symbol_index": 10,
  "last_run": "2026-02-26T14:30:00",
  "total_files_indexed": 25,
  "total_commits_indexed": 15,
  "total_symbols_indexed": 150
}

Incremental Progress:

  • Tracks last processed index for files, commits, symbols
  • Skips already-indexed items (idempotent)
  • Resumes from where it left off across sessions

Configuration

Default: Indexing is enabled and runs automatically

To disable:

idlergear_pause_indexing()

To manually control:

# Check if work available
from idlergear.indexing import should_run_indexing
if should_run_indexing():
    # Process batch
    result = idlergear_index_batch(batch_size=10)  # Larger batch
    print(f"Indexed {result['files_annotated']} files")

Architecture

Module: src/idlergear/indexing/background.py

Key Functions:

  • get_indexing_status() - Check what needs indexing
  • index_next_batch(batch_size, target) - Process items
  • should_run_indexing() - Check if work available
  • pause_indexing() / resume_indexing() - Control

MCP Server Hook: src/idlergear/mcp_server.py

  • _run_opportunistic_indexing() - Called in finally block of call_tool()
  • Global flag _indexing_in_progress prevents recursion

State File: .idlergear/indexing_state.json

  • Tracks progress across sessions
  • Survives MCP server restarts

Benefits

For Users

  • No manual work: Annotations and graph fill in automatically
  • Always up to date: Runs after every tool call
  • Non-blocking: Never slows down active work
  • Progress visibility: Always know what's left

For AI Assistants

  • 93% token savings: File annotations enable semantic search
  • 95-98% token savings: Knowledge graph enables context queries
  • Faster responses: Pre-indexed data = instant lookups
  • Better context: Complete file/commit/symbol data always available

Examples

Automatic Usage (Default)

Just use IdlerGear normally - indexing happens automatically:

# User calls any tool
idlergear_task_list()

# → Tool executes
# → Returns results
# → Indexes 5 items in background
# → Returns to idle

Manual Batch Processing

Process multiple batches explicitly:

# Index 50 files (10 batches of 5)
for i in range(10):
    result = idlergear_index_batch(batch_size=5, target="files")
    print(f"Batch {i+1}: {result['files_annotated']} files annotated")

Check Progress

Monitor indexing progress:

status = idlergear_indexing_status()

print(f"Files: {status['file_annotations']['percent_complete']}% complete")
print(f"Commits: {status['knowledge_graph']['commits']['percent_complete']}% complete")
print(f"Estimated batches remaining: {status['summary']['estimated_batches_remaining']}")

Future Enhancements

Potential improvements:

  1. Adaptive batch sizing: Larger batches when system very idle, smaller when busy
  2. Priority hints: User can mark files/paths as high priority
  3. Time budgets: Maximum time per batch (e.g., 500ms)
  4. Completion callbacks: Notify when all indexing complete
  5. Metrics tracking: Histogram of indexing times, success rates

Troubleshooting

Indexing not running?

  • Check if paused: idlergear_indexing_status()"paused": true
  • Resume: idlergear_resume_indexing()

Indexing too slow?

  • Check batch size (default: 5 items)
  • Manually trigger larger batch: idlergear_index_batch(batch_size=10)

Want to force immediate indexing?

# Process all remaining work (blocking)
while idlergear_indexing_status()['summary']['total_work_remaining'] > 0:
    idlergear_index_batch(batch_size=10)

Files not being annotated?

  • Check if already annotated: idlergear_indexing_status()"annotated": N
  • Check for errors: Run index_batch() and inspect "errors" field

Summary

The opportunistic indexing system ensures that IdlerGear's knowledge base is always:

  • ✅ Up to date (runs after every tool)
  • ✅ Complete (eventually processes all files/commits/symbols)
  • ✅ Non-blocking (small batches, idle-time only)
  • ✅ Transparent (automatic, visible progress)

No more manual indexing commands - just use IdlerGear normally and it fills in the gaps automatically!